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I. BiTRODUCTIOW 


This final report on project HAS 2-56^3* Research in Sequential 
Decoding consists of two main portions: results of Phase I and II of 

our work. 

Phase I deals with problems of reliable transmission through noisy 
space channels and is subdivided into four areas: A. Work on sequential 

decoding in general and the Stack algorithm in particular. B. Work on 
the Bootstrap Hybrid Scheme. C. Development of good convolutional codes. 
D. Development of a new bootstrapping hybrid approach to the Viterbi de- 
coding algorithm. 

. Phase II of the project deals with problems of encoding of space 
sources for the purpose of data compression. It is subdivided into two 
•areas. A. Work on tree encoding with fidelity criterion. B. Work on 
Permutation encoding with a fidelity criterion. 

This report is written according to the above outline. A substantial 
portion of it has already been presented in the three preceeding quarterly 
progress reports. We follow the precedent established there: The results 

are summarized and their implications are discussed in the body of the 
report, but details are left for Appendices. 



II. REPORT ON PHASE I 
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II-A. Work on Sequential Decoding 

TT -A-1 . Path Specifications in terms of Parity Digits 

In this section we will describe how parity digits of binary con- 
volutional codes should be used to speed up sequential decoding both by 
the Eano and the Stack algorithms. We will show what information ought 
to be saved so that the decoded message sequence can be recovered by the 
user'. We confine ourselves to rate l/2 codes, but generalization to 
rate l/n codes is very simple. 

Let G(d) of degree u-1 be a binary convolutional generator, and 
let ;s(D) be the input information sequence. The output sequence is 
then given by 

\ 00 

X(D) = G(D) S(D) - 2, s i d1 G ( D ) . (!) 

i=o 


The digital circuit corresponding to (l) is given in Figure la. The 


contents of the shift register stages P^ are "0 n, s at time 0. Let 


p n (D) = E p? D 1 be the shift register state sequence after s ^ 
i=o 

has -been inserted. Then the output at. time n + 1 is 

n 


x = s + p 
n n o 


■( 2 ) 


and in general, all the future outputs depend only on the initial state 
state sequence P n (D) and -on the future inputs s^ , s n+ ]_; . . . : 


u-2 


I 

i=0 


x ... D 
n+x 


I ^ 


j=o 


+ I s n +3 V* 8(D) (3) 
d=o 
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The realization of Figure la is particularly convenient for digital 
computer implementation. Let G*(D) be defined by 

G(D) = g Q + DG*(D) (4) 

Then 

p n (D) = D~ 1 p n_1 (D) + P^'V 1 + s n _ 1 G*(D) (5) 

with P°(D) = 0 . It follows from (2) and (5) that if the parity sequence 
p n (D) and the truncated generator sequence G*(D) are stored in index 
registers, then if s =® 1 , the output will be the complement p^ + 1 

of the. rightmost stage of the parity register, and the next parity register 
contents will be obtained by first a shift right of that register fol- 
lowed by an exclusive or into it of the contents of '.the, generator register. 
Similarly, if s n = 0 then x^ = p^ and the next parity register contents 
are obtained by a shift right of the former contents. It follows that 
as long as v-1 does not exceed the sis§ of the computer register, the 
number of operations necessary to generate X(D)doe”s not grow with v. 

In sequential decoding (this applies to both Fano and Stack algor- 
ithm's), one must store as much information about a path being worked on as 
would be necessary for recovery of the message sequence corresponding to 
it. This is so because the path may become the decoded one in which case 
its message sequence must be supplied to the user. We will now show how 
Sq, s., . . . , s n _^ may be recovered from P n (D) and p‘^ p^ ,..,p^ 
provided g j= 1 (which is so without loss of generality). In fact, 
since D ^ P n (D) + p^D ^ is of degree v-3 then it follows from (5) 
that 

S n-1 " V2 


( 6 ) 



( 7 ) 


Furthermore, using (6), for all n = 1,2,... 


P n-1 (D) = D P n (D) + P0" 1 + P^_ 2 DG*(D) 

Thus both s .. and p n ^(D) can be obtained from P n (D) and p n ^ . 

n-1 v 

By recursion therefore, p n (D), \ ...jPq determine s n _^> s n _2 J * * * ’ s 0 * 
However, it follows directly from Figure 1 that for k ='1,2,... 

k-1 


P k (B) “ 


D -(k-!) 


s. D 
1 


1=0 


( 8 ) 


where LJ denotes the operation of dropping all negative degree terms, 
Since g ^ = 1 then 


Let 


s v-2 

V“1 

P v-2 

(9) 

R V_1 (D) 

= P V ^(D) and for k = 1,2, ...,v-l> 


R k “ L (D) 

= D |R k (D) + r k _ 2 G*(D)|' mod D V_1 - 

(10) 


then it follows from (8) and (10) that 


i+1 

s. = r Q 
1 v-2 


for i = 0,1 ,.. .,v~2 


(ID 


Thus ...,s^ 2 riay be recovered from P V ^(D) so that only 

P n (D). p n p V ^ determine s , ,s 0 ,...,s as asserted. Figure 

\ /) r Q 5 5 j 0 n-1 n-2* o 

lb shows the digital circuit that does the job. It has the structure 
that performs according to (6) and (7) . However if we feed into it the 
sequence 

n-1 n-2 v-l 


P Q S P 0 J“*"> Pq j ^ , 0 , *»• » 

' v 


( 12 ) 


V-l times 



then -after n - v + 1 shifts the state sequence will be R V ^(D), and 
after n - v + i shifts it will be R V 1 (D) . The outputs will be 
S n l ,S n 2’°" ,S o as ^ n ^^- cat:e ^* The computer implementation of the 
process of Figure lb is similar to that of Figure la. It shall be 

observed that it is possible to recover s n ’ s n _i 5 * “ *’ s n -k ^ rom ^(D) 

. _ . , , n-1 n-2 n-k+v-2 _ _ . ^ _• 

if we feed the sequence p Q ,p Q ,...,P o , 0*...,0 into Figure lb. 

We shall now apply the above results first to Fano decoding and 

then to Stack decoding. In Fano decoding it is necessary to generate 

both p n ^(D) and p n "*"^(D) from p n (D) and when • seceding to find 

the likelihood of the preceding mode. Consider a rate 1/2 code with 

generators 


X-l 

G 1 (D) - s l 3 i^ = 1 + DG *1 (D) 

i=o 

v-1 

G 2 (D) = ^~'g 2s i 1)1 = 1 + d G|(D) 

i=o 


where 

X < v 

and 

g l 9 o “ S 2 s o ~ S l s \-1 “ 

S 2, v-1 1 ' 

code. 

1! 

and 

C*(D) =0 . The coder 

outputs are 


(13) 


For a systematic 


03 

X X ( D) = G 1 (D) S (D) = Xj.-.D 1 

. 5 1 

1=0 


X 2 (D) 


G 2 (D) S(D) 



1=0 


(14) 


If Y^(D) and Yg(D) are the corresponding received sequences (which 
need not be binary) then a likelihood of a branch at depth n is given by 



Therefore, for fast retreat, it would be useful if the decoder, located 


at depth n stored .the unrelative likelihood L n , the ^sequences 

x, ...... x, , and x 0 ,...,x 0 . (as well as the received sequences 

l,o* ’ 1, n-1 3»° 2, n-1 


Y^(D) and Y 2 (D)) and the parity sequences 


X-2 


V-2 


P?(D) 


■I 


J=° 


p!? and 

ItJ 




n 1 
Po . B J 


2,j 


(15) 


J=o 


When advancing, along a branch pertaining to s^ , the decoder generates 


n 


x. = s + p . 
x,n 



(D) 



+ s G*(D) 
n r 


(16) 


for i = 1,2.. This is accomplished by two circuits similar to that of 

n n+1 

Figure la. It stores x- ,x„ and replaces P. (D) by P. (D) for 

i,n 4,n i i 

i = 1,2. Finally, it replaces L by 


W = L n + X(x l.n' 3 'l.n :> + Hx 2,n’ y 2,n> 


(17) 


When retreating, the decoder replaces by 


L . = L - X(x-| • i>Yi i) “ X(x 0 i,y 0 -,) 
n-1 n 1 , n-1 1 , n-1' v 2, n-1 2, n-1 


and p“(D) by 


i 


(D) 



+ x. 


+ P n 

,n-l + \ 


(18) 


(19) 


where k. = X-2, k Q = v-2. Finally, it erases x. , and x 0 - from 
its storage. The operation (19) is accomplished by the circuit of 
Figure 2a. If the code is systematic then P^(D) = 0 for all n and 



7 


x, = s . If the code is non-systematic then s„ , s. , . . . , s . mustt 
1, n n J 0 1 p-1 

somehow be recovered for the user . There are two ways to do this . Either 
at the end of the block of feeding x.. , _ OJ .. .,X- through the circuit 
of Figure 2a for i = 1, or by forward generation using the circuit of 
Figure 2b that corresponds to 1/G^(D) . This latter method has the 
advantage that information may be released to the user before the block 
is entirely decoded. 

In stack decoding one does not recede, so there is no sense in 
storing x. . and x 0 . . However, it is essential to conserve storage 
as far as possible. Therefore, a stack -entry corresponding to a path 
of depth n ought to contain the sequences P^(D) and as we ^ 

as pointers to its past p^ 0 jP^ 0 s*«*>P-^ q • (D) and P^ (?) 

are obtained by use of .circuits like Figure 1, and the decoded sequence 


Sp is obtained at the end from a circuit of Figure lb. Of 

course, if the code is systematic, then P^(D) = 0 and one saves 

S n-l’ S n-2’**’’ S o instead of p l 5 o’ ••' ,p l s o * 



II-A-2 Maintenance and Purging of the Stack and the Associated Map 
for the Stack Decoding Algorithm 

In the Stack algorithm, the Stack entries must contain information 
about the corresponding path necessary to extend the latter and to deter- 
mine the corresponding message sequence (in case the path is closer to 
the decoded one) . In the preceding section we have shown that it is 
advantageous if each Stack entry contains (if R =1/2) the two parity 

sequences P^(D) and P^C 0 ) an ^ e ^ t ^ ier t ^ ie P ast parity- sequence 

n n-1 n-2 X-l , . . - . . n 

p. = p T ,p.. , p- or the past information sequence s =s _,s 

. ... ,s q (the two are identical for systematic codes). We will deal 

here with s n . Remarks about p^ would be similar and they are made 

wherever necessary in Appendix 1. 

Since s^ 1 is only needed at the- end of and not during the decoding 
process, access to it need not be a lost one. Thus, as described in 
reference [1], the various ^s n sequences are specified in a linked 
. map, and the appropriate one is linked to the Stack entry by a pointer. 

The map specification itself takes advantage of the tree structure of 
the code. 

The map must contain at all times the specification .of all paths 
corresponding to "live" entries in the stack. Since the stack is 
finite, it is purged -according to the principle "least likelihood first." 
The map may contain some paths no longer in the -.stack, but .efficient 
storage use requires that there be as few dead -paths as possible. Hence 
the need for map purging.. A report [1] by the author describes how map 
purging can be carried out in a manner directly dependent on -stack -purging, 
but the method requires establishment of counters for every live map 
branch whose content indicates the number of live paths that have that 



branch in common. New map management strategies were developed that 
do not require any counters . 

The first two strategies are for a map that specifies by 

linking positions of 1-branches to preceding ; 1-branch positions., 

E.g., the path 100110100 is given by the linked position arrangement 
— 7 — ^ 5 — ^ 4 — ^1 — ^ - (v-1) (v is the code constraint length and 
all paths are linked to position -(\)-l) by convention). The purgiqg 
principle of the first strategy is as follows: a branch can be elimin- 

ated from the map if its depth is t less than the depth .of 'the ;path 
on top of the stack and if that path leads through that branch. Of 
course, it is understood that if the furthest depth of advance in the 
tree is 1^^. then all information digits up to depth - t have 

been definitely decided. The value of t must be chosen So that the 
probability of erroneous poemature decision is sufficiently low. 

It may also be desirable to make final decoding decisions according 


to a different than 1^^ - t depth rule. For instance, let L-^Lg, . . .,1 
be the cumulative likelihood values at depths 1,2,..., k of a path of 
depth k that is on top of the stack. Then one might decide all informa- 
tion digits up to depth m where 



•and T is some suitable fixed threshold. The second strategy purges all 
map positions of .depth m or less where the value of m is determined 
by any arbitrary rule (m is, of course, a non-decreasing function of 
time) . This strategy does require the establishment of additional arrays 
in storage. 

Finally, the third maintenance and purging strategy applies to maps 
whose paths are specified by sequences of information digits. The stack 



has locations Ml containing j(\>-l)+k binary digits (v is the con- 
-straint length of the code and k is arbitrary), the right-most being 
the most recent one. It has a counter indicating the depth of the path 
and a pointer Pi to the location in the mhp that contains the preceeding 
path sequence of length k . The map has locations M2 of .k digits, 
pointers MPP indicating the location of the preceeding path sequence, 
and pointers MPL linking all M2 locations that correspond to the 
same path depth. There are also pointers to the firfet and last map 
locations of any given live depth (a fixed number j of depths are 
live at any time) and a pointer to the first free (or replaceable) map 
location. If 1^^ is the depth of deepest penetration in the tree, 
then the purging strategy assumes that the map will contain no locatL ons 
referring to depths prece di'ng - (v - 1) - jk . 

The details of the three strategies are .described In Appendix 1. 
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II-A-3 Multibranch Advance through the Tree of the Decoding Algorithm 

5 

Rate 1/2 binary codes have 2^ branches leaving every node, each 

branch -containing 2$ digits. In practice codes with 9=1 are used 

•only, since an advance by one node involves finding the branch whose 

likelihood is m ta largest. The straight-forward way of doing this is 

0 

to evaluate each of the 2 likelihoods and then order them. This is 
.too large an undertaking. However, if the branch could be looked 

up directly in a moderate size table, making 1 would speed-up 

both Fano and -stack decoding appreciably. Furthermore, simulation has 
shown that the needed stack size could also be substantially reduced. 

In Appendix 2 we show how such tables can be constructed for binary 

2 S 

input symmetric channels.. The table size grows as K2 . The coefficient 
K is larger for non-systematic -codes for the BSC than for systematic 

i 

•ones, and an extension is more cumbersome. For a channel with 2 inputs 

29 

and 2j outputs the table sizes are also .of size K2 , but -exact 
likelihood ordering is not possible. However, the approximation seems 
sufficiently close as to make the procedure a worthwhile one. 



II-B 


Work on Bootstrap Hybrid Decoding 


II-B-1 Simulations -of Bootstrap Hybrid Decoding over the BSC 

Appendix 2 contains a detailed description of three (progressively 
more sophisticated) bootstrap hybrid decoding algorithms as used over 
the BSC. The first is the - rudimentary algorithm in which the binary 
channel -state stream is modified only if some received stream is completely 
decoded. The second is the pull-up algorithm where the state stream is 
modified even after partial decoding of some stream. Specifically, if 
the furthest advance along a stream is to depth 1^^^ then all digits 
up to depth I MAy - J are considered definitely decoded and the state 
stream is therefore modified up to depth • Finally, the two-way 

algorithm is the pull-up algorithm with the added feature that attempts 
at stream decoding are made in both f orward and backward directions . It 
is based on the observation of Dr. Dale Lumb that it is possible to 
decode a convolutional code backward as well as forward, provided each 
■string of r information symbols is terminated by \>-l dummy bits known 
to the decoder . The bootstrap algorithm starts by decoding forward in the 
pull-up mode and continues to do so until a full decoding round takes 
place without completing any of the streams. In that case decoding in 
the backward direction starts and continues until another unsuccessful- 
full decoding round occurs, in which case forward decoding resumes, etc. 

A stack of 1000 entries is used and if succeeding forward and backward 
rounds end without an advance of more than -20 branches on any stream in 
either direction, the stack -is increased to 8000 entries for the next 


two .rounds. 
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Table I contains a summary of three randomly selected decoding runs 
that use the rudimentary bootstrap hybrid decoding scheme of convolu- 
tional rate R = .5 over a : BSC with crossover probability p = .07 

(R = .4). Stack decoding is utilized. We use m = 10 streams so 
comp 

the net rate is = *45. Other parameters of interest ' are block 

length r a 1000, block termination t = 25, and number of decoding steps 
allowed on a stream M = 5000. The printout indicates which of the 
m = 10 streams was worked on (JNOW), how many decoding steps were taken 
(N3), how deeply the decoder penetrated (IMAX) into the tree within 
the N3 steps taken, and how many undecoded streams were left (KLEFT) . 

Finally, the speed factor (SF) is given for the entire block. SF is 
defined as the ratio of the total number of decoding steps taken to the 
number of information lists decoded. The Table shows quite clearly how 
fast the remaining streams can be decoded once the first three -or four 
are known. 

Table 2 shows decoding progress in a typical run of the pull-up 
algorithm over a BSC with crossover probability p = .08. A convolu- 
tional code of rate R = 1/2 was used and m = 10 • streams formed a 
block. The maximum allocation M = 5000 and the stack had 700 entries. 

The parameters JNOW, N3, IMAX, KLEFT, SF, and KTRY have the same 
meaning as in Table 1. The value of JSTART indicates the depth of the 
node at which the decoding of the particular stream began. The definitely 
decoded back-up limit was J = r 200,. If in a decoding round no stream 
advanced by more than 20 levels beyond its previous maximal depth, M 
was temporarily increased to 20000 and che stack size to 8000 until 
such an advance took place. This phenomenon can be observed in row 20 
of the Table. It becomes apparent for the present example that without 



bootstrapping it would be completely impossible to decode 9 of the 10 
received streams of this block as the older Falconer scheme would require. 
In fact, we were not able to decode the fifth stream without '27000 steps 
even when using information from the decoded streams 2 and 7 and the 
■almost decoded stream 81 ' It seems fair to say that the Falconer scheme 
could decode at most three of the ten received streams and no more. 
Bootstrapping is no "endgame" --it does not complicate the decoding 
:search and ought to be used right from the start. 

Table 3 shows an example of two-way decoding • over a BSC with 
crossover probability p = .09. The parameters JNOW, N3, IMAX, JSTART, 
KLEFT, SF and KTRY have the meaning given them in Table 2, except that 
when decoding is backward, nodes are numbered in reverse order so that 
forward node 1000 is backward node 1, etc. (this affects IMAX and 
JSTART). The parameters IFORW is 1 when forward decoding took place 
and is 2 otherwise. The parameter KROUND Indicates how many streams 
were attempted in a given direction since the last successful decoding. 
When its valrae reaches that of KLEFT, decoding direction is reversed. 

We have run all of our simulations using the stack decoding algorithm 
applied to transmission of data over a binary symmetric channel with 
crossover probability p . The systematic code of constraint length 
v = 72 whose taps in octal notation are 651102104421022041101101 
(obtained by Costello [1969]) was used, the number of streams was m = 10 
(this value was picked arbitrarily without any attempt at optimization) 
and there were always 1000 true information bits per information stream, 
[i.e. 9000 bits per block]. 

Our simulation results are summarized in Table 4 which gives certain 
•parameters of interest that we now explain. For different crossover 



probabilities we have used different bootstrapping algorithms. The 

crossover probability p = .056 was chosen because the corresponding 

channel has R = .45 which is equal to the net rate of .our scheme, 
comp 

Hence the dB gain over straight sequential decoding is 0 . Figure 3 
is based on 2000 blocks of data and shows the distribution of computation 
per decoded information bit f speed factor 1 when the rudimentary algorithm 
is used. As is usual, an extension of a node by the decoder serves as a 
unit of computation, and the speed factor was obtained by simply dividing 
by 9000 the total number of computations necessary for decoding of a 
block (the "rudimentary algorithm .is a block scheme and it is not clear 
how to assign particular decoding steps to particular information bits). 

The startling result of this simulation is that if tail behavior of the 
distribution could be extrapolated as a straight line on the log-log plot 
(which is certainly O.K. in sequential decoding) then the asymptotic 
computational distribution would be 

P [SF > x] £1380 x- 12 - 8 

This would mean that a speed factor 5.17 would be needed only once in 

6 9 

10 blocks, and a speed factor of 8.92 only once in 10 blocks! 

However, a glance at Table 4 shows that the largest limiting exponent 

(derived according t-o the analysis of reference [1]) can only be 2.74 

and we are at this time at a loss to explain this discrepancy. The most 

likely reason is insufficient statistics - 2000 sample points is not enough.* 

It is difficult to extend the sample size substantially. 2000 blocks 
involves 18 x 10® bits and our Fortran algorithm took 80 minutes of IBM 
360-91 computer running time. A similar discrepancy between an observed 
and theoretical Pareto exponent was reported by Forney [2] who did high- 
rate simulations of sequential - decoding on the G;aussian channel. In 
his case it turned out that a theoretical exponent of 0.087 was observed 
to have an experimental value in the range 0.38-0.41. 



Under this hypothesis the time distribution will assume its final .slope 

-3 

•somewhere below the probability 10 . The intriguing point is that 

should this take place at a small enough probability then the practical 
exponent might still be 12.8! Another cause for the anomaly might be the 
various computation truncations inherent in our algorithm. We shall 
investigate further and report more completely at a later date. 

In any case, if the observed behavior can be extrapolated even 
approximately then the bootstrapping algorithm may be used to great 
advantage even at rates equal to R com p in order to stabilize the de- 
coding effort and prevent block erasures due to buffer overflow. It is 
particularly interesting that in the 2000 blocks decoded, only one required 
more than 12 attempts at stream decoding (the minimum is 9). The capacity 
of -this channel is C = .69, so R/C = .731, and we have entered this 
point as a circle into the plot of Figure 5. 

We feel that about the noisiest BSC over which it is practicable to 
run the rudimentary algorithm with strehm length = 1000 bits is one 
whose crossover probability is p = .07. Figure 4 displays the cor- 
responding computational distribution. Again, the apparent Pareto 
exponent of 2.66 is larger than the theoretical maximum of 2.2. The 
R/C parameter .of this experiment is entered as a triangle in Figure 5. 

As a next experiment we ran the pull-up algorithm over the BSC with 
crossover probability p = .08. We used a stack with 1000 entries and 
stopped -computation on a .stream either -if it was decoded or if a stack 
overflow took place. We considered permanently decoded all but the last 
2000 bits of the path that was in the stack .immediately before it over- 
flowed. This caused, no errors in the 1000 blocks that we ran and suc- 
cessfully decoded. If a round was completed without advancing the 



decoding of any of the remaining streams by more than 20 branches then 
the stack size 'was increased to 8000 for the next round. We did not 
obtain an experimental distribution, but only the average and maximal 
speed factors. The R/C parameter of this experiment is entered as 
a square in Figure 5. 

The final entry in Table 4 involves a BSC with crossover probability 
p - .09 over which we ran a two-way algorithm. 

Since two-way decoding uses more information than the one-way kind, 

/ 

the bounds of .reference [1] are not applicable to the former. Neverthe- 
less, the entry in the lower bound to Pareto exponent column of Table 4 
is derived according to the corresponding formula of reference [1]. 

Again, our simulation only determined the average and the maximal speed 
factors based on a run of 500 blocks. The R/C parameter, of this 
experiment is entered as a star in Figure 5. 
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.TABLE I 


Simulation Examples of Rudimentary Bootstrap 
Hybrid Decoding , of ■= .45, m •= 10 over BSC with p -= 0.07 

Block 1 


_JN0W 

N3 

IMAK 

KLEFT 

1 

5000 

.473 

10 

2 

5000 

1008 

10 

3 . 

4864 

249 

10 

4 

•1948 

1025 

9 

5 

1534' 

1025 

8 

6 

3655 

1025 

7 

7 

1320 

1025 

6 

■8 

'1849 

1025 

5 

9 

1495 

1025 

4 

10 

1178 

1025 

3 

1 

1350 

1025 

2 

2 

1079 

1025 

1 

SF = 

3.36 




JNOW 

N3 

Block 2 

IMAX 

•KLEFT 

1 

5000 

842 

10 

2 

5000 

749 

10 

3 

2610 

. 1025 

9 

4 

5000 

1010 

9 

5 

5000 

929 

9 

6 

■3735 

1025 

8 

7 

2132 

1025 

7 

•8 

5000 

948 

7 

9 

2553 

1025 

6 

-10 

. 5000 

552 

6 

•1 

1739 

1025 

5 

2 

1863 

■ 1025 

4 

4 

1297 

1025 

3 

5 

1160 

1025 

2 

8 

1066 

1025 

1 
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TABLE .1 CONT'D ' 


Block : 3 


JNOW 

N3 

IMAX 

KLEFT 

1 

■2524 

1025 

9 

2 

4377 

278 

9 

3 

5000 

239 

9 

4 

4880 

1025 

8 

< 5 

2288 

•1025 

7 

■6 

'3275 

1025 

6 

■7 

1659 

1025 

5 

8 

1246 

1025 

4 

•9 

1320 

1025 

3 

10 

■ . 1926 

'1025 

2 

•2 

1074 

1025 

1' 


SF = 3.28 
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TABLE 2 

A Simulation Example of- Pull-up, Bootstrap Hybrid 
Decoding of = .45, m « 10 over BSC with p = 0.08 


JNOW 

N3 

IMAX 

JSTART 

KLEFT 

1 

2744 

215 

0 

10 

2- 

2963 

1025 

0 

9 

3 

2891 

219 

0 

9 

’4 

1858 

93 

0 

9 

5 

2314 

141 

0 

9 

6 

2207 

192 

0 

9 

7 

3447 

1025 

0 

8 

8 

5000 

944 

0 

8 

9 

2958 

294 

0 

8 

10 

2353 

339 

0 

8 

1 

2729 

235 

15 

8 

3 

2143 

212 

19 

8 

4 

3052 

212 

0 

8 

5 

2329 

146 

0 

8 

6 

2767 

• 166 

0 

8 

8 

3301 

944 

744 

8 

9 

2037 

293 

94 

8 

-10 

2468 

341 

139 

■8 

1 

2834 

235 

35 

8 

5 

27062 

■800 

0 

8 

6 ■ 

3030 

287 

•0 

8 

8 

■3301 

944 

744 

8 

9 

2283 

287 

94 

8 

10 

2422 

341 

141 

8 

1 

5000 

762 

35 

8 

3 

2421 

1025 

19 

7 

4 

1322 

1025 

12 

6 

5 

2671 

716 

600 

6 

6 

2913 

292 

87 

6 

8 

2799 

944 

744 

6 

9 

5000 

852 

94 

6 

10 

2040 

1025 

141 

5 

1 

839 

1025 

562 

4 

5 

774 

1025 

600 

3 

6 

979 

1025 

92 

2 

8 

302 

1025 

744 

1 


SF = 13.05 


KTRY = -36 
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TABLE 3 


A Simulation Example of Two-way Bootstrap .Hybrid Decoding 
of = 0.45, m = 10 over -BSC with p = 0.09 


JNOW 

N3 

IMAX 

JSTART 

KLEFT 

KROUND 

IFORM 

1 

5910 

272 

0 

10 

1 

1 

2 

5356 

139 

0 

10 

2 

1 

3 

5731 

354 

0 

10 

3 

1 

4 

7514 

640 

0 

10 

4 

1 

5 

4262 

182 

0 

10 

5 

1 

6 

5537 

164 

0 

10 

6 

1 

7 

3770 

164 

0 

10 

7 

1 

8 

6002 

200 

0 

10 

8 

1 

9 

5819 

351 

0 

10 

9 

1 

10 

8401 

734 

■0 

10 

10 

1 

1 

5695 

443 

0 

10 

1 

2 

2 

6542 

589 

0 

10 

2 

2 

3 

8395 

307 

0 

10 

3 

2 

4 

3740 

166 

0 

10 

4 

■2 

5 

4103 

136 

0 

10 

5 

2 

6 

4671 

114 

0 

10 

6 

2 

7 

4329 

277 

0 

10 

7 

2 

8 

6909 

733 

0 

10 

8 

2 

9 

5013 

157 

0 

10 

9 

2 

10 

5373 

332 

0 

10 

10 

2 

1 

3650 

262 

72 

10 

1 

1 

2 

3306 

1071 

0 

9 

0 

1 

3 

4589 

388 

154 

9 

1 

1 

4 

4149 

651 

440 

9 

2 

1 

5 

5443 

228 

0 

9 

3 

1 

6 

5440 

254 

0 

9 

4 

1 

7 

10265 

950 

0 

9 

5 

1 

8 

4095 

224 

0 

9 

6 

1 

9 

5738 

351 

151 

9 

7 

1 

10 

4480 

722 

534 

9 

8 

1 

1 

5751 

244 

72 

9 

9 

1 

3 

6377 

309 

107 . 

9 

1 

2 

4 

8493 

278 

0 

9 

2 

2 

5 

7566 

715 

0 

9 

3 

2 

6 

4788 

373 

0 

9 

4 

2 

7 

975 

1071 

77 

8 

0 

2 

8 

4283 

874 

533' 

8 

1 

2 

9 

5202 

443 

0 

8 

2 

2 

10 

3805 

1071' 

132 

7 

0 

2 

1 

4685 

465 

■ 243 ' *. 

7 

1 

2 

3 

3510 

1071 

109 

6 

0 

2 

4 

5007 

443 

78 

6 

1 

2 

5 

3260 

1071 

515 

5 

0 

2 

6 

1445 

1071 

173 

4 

0 

2 

8 

438 

1071 

674 

3 

0 

2 

9 

1230 

1071 

243' 

2 

0 

2 

1 

779 

1071 

265 

1 

0 

2 


SF = 25.75 


KTRY = 47 



Grossover 

probability 
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TABLE 4 


Summary of Simulation Parameters for 
BSC, = 0.45, ,m = 10 


o 

'H do 

1° 

p? S 

h mi 
HI 

> r-l 
O d 

•rl 

C 4-1 

■rl d 

d 0) 
60 d 
67* 
FQ <U 
W 


O 

P? 


m 
60 -d- 

c 

•H 

ml 

O 
O pi 
<0 
ml 


II 


f-l 

O 


H 
4-1 

C 
<j) 

- ~a 

cr cu 
<u 

co cu 


4J 

a 

Q) 

C 



r* 


U -u 

U 4J 

l 

a> d 

o d 

4J 

a. qj 

£ a 

•H 

o. d 

o d 

ml u 

d o 

r-l o 

•rl O 

& 

a 

M 60 

r-l * 

r-l R 1 

rQ 1-4 

d a 

d ai 

d 

o 

0 

Si 

•rl O 

-rl O 

60 

-U 4J 

4J 4J 

m d 

(D 

a 

a "r-l 

i-i mi 

U ml 


o d 

o d 

<a o 

a> d 

a d 

a. o 

-d o 

•d O 

!>i d 

H .o 

E-l rd 

E-t m 


■U 

d 

i 

M 

(U 


& & 
X X 
W <0 


o 

•u 

O 

d 

to m 
60 

d mi 
m co 
<0 (0 
> & 
<; cn 


o 

4J 

O 

tfl 

r-l l« 

d 

e -o 

•rl <U 

X a> 
m a. 
S » 


ml 
u <u 
a) mi 
.a o 
| o 
d <u 
53 ml 


0.056 

0.07 

0.08 

0.09 


0.00 

0.54 

0.97 

1.36 


0.731 

0.788 

0.837 

0.887 


1.0 

0.75 

0.55 

0.41 


2.74 2.25 .rudim 


2.2 

1.9 

1.6 


1.5 

1.2 

0.81 


rudim 
pull up 
two way 


12.8 

2.66 


1.535! 

4,23 

7.00 

22.00 


3.93 

16.3 

24.5 

100.0 


2000 
500 
1 1000 
500 

I 


of blocks 















II-B-2. Bounds -on Computing Effort for Bootstrap Hybrid Decoding on 


Binary Input Channels 

Let us .generalize the encoding and decoding methods -of Appendix 3 
to channels -symmetrical from the input -that have two input -symbols and 
an arbitrary number -b(> 3) of output symbols. The encoding is -one 
involving m-1 information streams and an additional parity check 
stream. Suppose we receive the m streams and wish to decode the last 
of -them -(this happens t-o make notation convenient and is without loss 
of .generality), y^.(m) , ygOn),... Since for -every time -interval i the 
receiver has at its disposal the vector 

y. (m) £ (y (1) , . y . (2) , . . . , y . (m) ) (1) 

/sA- rx x 

the sequential decoder -ought .to calculate .the likelihood .function \ (i) 
at depth i .by the formula (capitals denote random variables) 


X_(i) = log 

m 


where the algebraic constraint 


p fe (m) = / x jW) 

P f Y . (m) = y (m) ] 

t^x ^x J 


- R 


( 2 ) 


m 


j-* 


x^(j) = 0 is assumed to hold and 


must be used when calculating the .probabilities in the argument of the 
logarithm. 

th 

It is shown in reference ['3] that for the j received -stream 
the expression (2) can be simplified to have the form 


A (i) = 1 - R - log ri + Q(y- (m))] 
m rot 


+ log |q(x i (j)/y i (j)) 


1 + 


Q(y, fa)) 

^•x 


|_ 2q(x i (j)/y i (jp,)rl 


(3) 
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where x^j) e (0,1), 


q(x|y) = 


w(y]o) + w(y]l) 


and 


m 


Q(y n - fa)) = 

/VI 


f «i (o/y i ( j>) - q(i/y i CJ) ) 3 


j=i 


(4) 


(5) 


The above formula suggests an efficient instrumentation for hybrid decoding 
of the .class of channels considered. The state of the channel at the 
various time instants is given by the sequence QCjr^m)), Q(y 2 (m)), 

Q (y _ (m) ) , . . . . In fact, except for Q(y. (j)), the formula (3) is a func- 

/sjX 

tion of events x^(j) and y^(j) that themselves pertain to the 
stream. 

Thus, upon receiving the symbols that correspond- to the m trans- 

r 

mitted streams, the decoder will compute the channel state stream whose 
thl 

i entry Q. will .be -the number Q(y.-fa))* (i.e. not a binary digit 

1 ,J1. 

signifying the parity of the 1^ position as before). Decoding will 

then proceed as outlined in Appendix .3, based on the likelihood func- 

fch 

tion (3), until one of the -streams, say the , is decoded. The 

necessary recomputation of the channel -state stream will -simply consist 

til ^ 

of replacing the i entry -by its new value Q^=Q^/[2q(x^(j^)/y^(j^))~l] 

til 

where x^(j^) is the decoder's estimate of the 1 transmitted digit 
of the j!^ 1 stream. Decoding of the remaining m-1 .streams will then 

start from the beginning and will continue t-o use the likelihood (3) 

1 th 

based on the new state -stream values . When a stream, say the j % , 


‘'Since the number of possible values -of Q. is rather limited, the "state 
stream would in practice contain only the address A(Q^) of a table entry 
containing the number Q. . Or, even better, there would be a likelihood 
table whose entries woulcl be formed from the value of the ':triplet ■ [x^Cj), 
y ± (j), A(Q^)]. The problem of -limiting the size of such a table is dis- 
cussed at the .end of -this section. 
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th 1 

is decoded, the i state stream entries Q i will be replaced by 

it 1 

entries Q. = Q i /[2q(x i (j 2 )/y i (j 2 )) - 1]-, etc., until just one stream 
remains undecoded. The latter's identity will .be determined from the 
■parity constraint. 

As mentioned in footnote * there might arise a problem of -storing 
the state stream entries . Let us consider the case where the output 
alphabet size .b is even. Since the channel is symmetric from the 
input, every digit y can be represented by a pair (u,v) where u 
is binary, v e £o, 1, . . . , (b/2) -1^ and 

w(u=0,v/x) = w(u=l,v/x © 1) (6) 

for all x -e (0,1) and v . It follows then that 

* A 

g(v)- = q(0/u = 0, v) - q(l/u = 0,v) = 

- [q(0/u = l,v) - q (1/u = l,v)] (7) 

* 

and therefore, . letting 



( 8 ) 


(9) 


2q(x/u,v) - I = q(0/x © u,v) - q(l/x ® u, v)- V 

= (-1) X ® U . g(v) 

til 

then if x^(j^) is the decoder's final -estimate of the i 1 transmitted 
digit on the -stream, is to .be .replaced by its new value 
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. X.CjJ + 

Q. = (-D 1 Qi/gCv^ 


( 10 ) 


m 


If n(v) denotes the number of v^'s whose value is v , then after 
-k streams have been decoded, will have the form 

b/2~ . 

n(v) 


Q. = (-D‘ 


g(v) 


v=o 


where ,g(b/2) = 1 and n(b/2) = m-k . Since 


b/2-.. 


n (v) = m 


v=0 


( 11 ) 


it follows that Q. must .have one’ of at most 

/| + 


/ 


values. Hence a complete likelihood table -would be -of size 

U + } 

2 + m 


Sb 


\- * ■/ 


( 12 ) 


The values .of (12) for a two bit and three bit output quantization 
with m « 10 are - 528 and 16016 . respectively. .The -latter figure 
certainly seems excessive 'and yet -three -bit quantization is used quite 
frequently. One possible remedy is not -to use all available -information 
at -the receiver. The -simplest would be to use -in the -state 'stream only 
the points 'z defined in (8) and use the .likelihood 



27 


\ (i) 
m 


log 


w^(u:.(j)v v.CjV zv/^CjO'); 
w m (^U) ? v*u;, z^. 


R 


(13) 


where 

w m ( 0 > v,0 /°) * w m( 1,T,0 /0) = w(°,v/°) q n _ 1 (0) 
w m (0,Y,l/0) = w m (l,v,l/l) = w(0,v/0) q m-1 (l) 
w m( 1,v,0 /°) K ^ m (P,v,0/l) « w(l,v/o) <^(1) 
w m (l,v,l/0) = w m (O r Y,l/l) = -w(l,v/0) q^O) 


is defined as in (l), and 


b/2 - 1 



v~0 




(15) 


Obviously, less severe restrictions on the information used are also 

possible. E.G-., for the purposes of Q,^ - -computation one may wish to 

partition the v-alphabet into subsets and represent each subset by some 

new letter v* . The likelihood table size is then -obtained by formula (12) 

b / 

into which i, the size .of the v* alphabet, has been substituted for ' 2 . 

Let us note that the switch from likelihood (2) to (13) simply 
involves a switch between equivalent channels used by the receiver for 
decoding. The maximum information channel (using the Q- state stream) 
is based on transmission probability pjl^(m) = y^(m) / / x^(m)j while 
the binary state stream channel is based on w (u. (j).,v. (i),z-A- GO). 

X 3* 3* X 

In general, let w*(y/x) denote the transmission probability of the 
equivalent channel used when decoding one of 'k undecoded streams, 
and define the function 


V°> 


(l+a) 



y 


-| 1+0 


* / / \ 1+0 
w, (y/x) 


x=o 


(16) 



1+ff 


Thus,, for the BSC E.(ct) is given -by 


/ 


= a - log 


r 


i 


i i 

r (l-p)q k _ 1 (0)] l4cr + [p q H (l)]^ 


1 1 

[ (i-p)q k _ 1 (i)] 1+cr + [p Vi (0)1+a 


1+ct 


-\ 


/ 


For the binary input, b-nary output -symmetrical channel with a ( 
state stream it -is 


E fe (ff) = -1 °g 


^ fjw(y(k)/o) jf + (y(k-l)) + f„(y(k-l))]| 


y(k) 


1 

1+cr 


'+ [w(y(k)/l) f+gdc-D) - f(y(k-l))j 


1 

1+tT \ 


1+CT 


+ k+c 


where 


k-1 


f + (y(k-l)) = 


[w(y(i)/0) +w(y(i)/l)] 


i=l 

k-1 


f (y(k-l')) = 


[w(y(i>/0) - w(y(i)/l)3 


1=1 


Finally, for the same channel, -when only the parity is used in the 
stream. 


if 


E k (cr) = a - log 


1 

1+CT 


| [w(0,v/0)q k _ 1 (0)] + 


I .Jut*” , 


[w(i>v/o) q^d)] 14 ^) + 

i 1+J \ 

[w(i, v/o) q^d)] 144 */ 


L 


[w(0,v/0)q k _^(l) + 


+ 

(17) 
i-type 

(18) 

(19) 

state 


/ 


( 20 ) 
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In reference [3] the following two theorems are proven: 

Theorem 1 

Let R be the .convolutional coding rate used in each of m streams 
of a decoding block. The bootstrap hybrid decoding procedure- based on 
w^Cy/x) leans to a finite y th moment or computation per decoded digit 
provided 

min (cr(k(R)) s (k(R) + 1) cK®)) > Y ■ (21) 

is satisfied, 'where k(R) is the unique integer such that 

k(R) ct(») < o(k(R)) (22) 

a(k(R)+l) < (k(R)+l) a («) . 

The function a(k) is the unique solution of 
E,(c) E (2) 

R = — for ~ — < R < C (23a) 

a ^ ~ 

and 

E,(2) E (2) 

CT(k) •= — — for 0 < R'< — ^ — . (23b) 

The function E^Cct) *- s th e concave, -positive, increasing function of 
O' > 0 defined in (16). 

Theorem 2 

V . . -pi 

^E[N ] grows exponentially with block -length -T whenever 

min [o-(2), .kcr(k)J < y (24) 

where cr(k) -is the solution of 

V°> 


R 


CT 



In this theorem it is assumed .that R < G , and that .the convolutional 
code used is a good -one in -the 'sense -that -its associated probability of 
error -i's -exponentially optimal. 

-Obviously, the net transmission rate (taking into account -the loss 
due to .the extra parity stream) is J 


R NET * TT R 


(25) 


Ordinarily, one would wish to transmit at a rate exceeding R com p 

T 

of the underlying channel so that .cr(«) < 1 . Define r b 00T^ to be 

the supremum of rates for which (21) is satisfied. Then we can say that 
til * 

the v computational moment will be bounded for the bootstrap hybrid 
scheme using m streams provided the net rate satisfies 


^NET < m R B00T ^ 


(26) 


Define R^ (y) as the greatest lower bound on rates for which (24-) 

jjUUI 


is satisfied. Then 


t’he y computational moment will grow exponentially with 
block length ^ if 

^NET > ~m~ R BOOT ^ ^ 27 ' ) 


In reference [3] we show that R^ oot (y) and R boot^ ' can be 
computed by the formulas 


R B00T ( ' Y ' ) 


min 1 max [— E, (y) , — 

k>2 l Y k Y 


E (¥)] 

-oo k 


and 


R 


U 

BOOT 


(y) 


min 



V Y > 


min 

k>3 



(28) 


(29) 


9 



When evaluating r bqOT^ one coin P ut:es the differences ^ E^Xy) " y^'a/k'* 


for k = 2,3, . , 
for k + then 
,L 


until their value becomes .negative. If -this takes place 


R BOOT ( ^ = min[ y V-1 (Y) ' 


E co(k+^ ^ 


(30) 


k Y 

It can be shown that the function — E, (—), k = 3,4.... has at 

Y k k ’ 

most one local minimum and no local maxima. Therefore when trying -to 
evaluate ^qq^Y) one computes the differences 


k ™ ,1s k+1 




■-2U 


k+1 k+l ; 


for k = 3,4,... until their value becomes negative. If this takes 
place for k , then 

4»t<v> = ■*> y -E. + (1+)] (31) 

Ic 

The qualitative improvement achieved by bootstrap hybrid decoding 

over straight 'sequential decoding for the BSC can be estimated -from a 

comparison of the curve ®- com p/ c vs * P ( c is the capacity) with the 
L 

•curve R B00T (1) /C vs. p. Figure 5 shows the corresponding plots together 

U 

with those of ( 1 ) / C vs. p and R g00T (l)/C vs. p. The quantity 

t 

Rp AL (l) is the rate above which the Falconer [4] scheme has an un- 
bounded first computation moment. None of the latter three curves 

m— 1 

takes account of the algebraic degradation factor (see (26)) which 

must be used when any particular hybrid set-up is compared with straight- 
sequential decoding. 

Figure 6 shews the curves S^'Cl). R^O) , R^, and C 

plotted against the signal-to-noise ratio (in dB) per bit transmitted 
through a hard -quantized gaussian channel with binary inputs. It can be 



seen that using convolutional codes of rate 1/2 a hybrid scheme 'with m = 10 

streams -will perform satisfactorily with an SNR per information bit that 

is at least 1.47 db smaller than .the SNR needed for straight sequential 

decoding. -Figure -7 shows the first four curves normalized by the fifth 

(capacity c) . Finally, in Figure 8 we plot the values of y vs. SNR per 

transmitted list that are solutions to equations R B0 OT 1 '^ = Slower'* 

and R u (v) = 1/2 (v ) for the BSC obtained from a Russian 

anQ B00T vy ' ' Supper 

additive noise channel. For -comparison we also plot the Pareto exponent 
a that corresponds to straight sequential decoding. In this connection 
the reader should recall the simulation results of the prece-dihg; section 
that seemed to indicate that the "practical" Pareto exponent is higher 
than the limiting theoretical one of Figure 8. 

We have also evaluated theoretical performance curves for the binary 
input Gaussian channel with octal and quarterhary quantization. To make 
■comparison easy, Figures 10 through 15 are all drawn to the ‘same scale. 

The quantization levels used throughout are the ones maximizing R com p 9 as 
obtained by T.nmb P ^ Therefore slight improvements might be possible in 
the C. 9 ^q^ 1 ) j and R boot^ curves , if the optimization were to be 
carried out with respect to those parameters. Figure 10 shows the rela- 
tionship between capacities and R 's for binary, quarternary, and 
r comp 

octal quantizations. Figure 9 gives the ratios R com p/^ £° r these quan- 
tizations which show the margin of possible improvement attainable through 
more sophisticated methods of which ^bootstrap hybrid decoding is ‘an example. 
We see .that the margin decreases as the number of quantization levels in- 
creases . 

Figure 11 contains plots of Rg Q0T (l) vs. SNR .per transmitted 
digits for the three kinds of quantization when maximal information is 
used to form the state stream. Figure 12 provides the same -curves when 
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the state 'stream is binary instead (i.e. when the likelihood is given by 
(13)). The next .two figures show clearly that the degradation in per- 
formance is only a very slight one and it might well be worth that price 
to obtain the attendant reduction in decoder complexity. Figure 13 

compares C and R for the quarternary channel with R^ „(1) curves 

comp BOOT 

for the binary and full channel state. Figure 14 does the same things for 
the octal channel. However, it turns out that in this case the difference 
between full channel state performance and a "quarternary" one is in 
the third significant digit and thus the latter curve cannot be entered 
separately into the graph. A "quarternary" state is one that would 
result if the 8 channel outputs were -optimally partitioned into 4 
classes and membership in the latter was used to determine the Q-type 
channel state. Finally, Figure 15 is a plot of ^boOT^ vs. SNR per 
transmitted bit for binary, quarternary and octal channel quantizations 
when a binary channel state is used. The quarternary and octal curves 
are so close to capacity that it would be impossible to draw the ^goOT^^ 
curves for full information channel states. 




II-B-3. 


A Bound on a Computational Parameter of Bootstrap Hybrid Decoding 


Let a bootstrap hybrid scheme involve transmission of m streams, 
m-1 carrying information. Let -the decoding be of the rudimentary kind; 
one either succeeds in decoding a stream entirely, in which case the state 
information is adjusted and decoding of the next stream is attempted, or 
one does not -succeed in decoding a stream in which case one passes to 
the next stream without having made any state adjustment. Decoding of 
any undecoded stream always starts from the first digit, regardless of 
whether previous decoding attempts at that stream have been made. Let 
us next define N i (K) to be the number of decoding steps in the first 

f"Vi 

incorrect subset of the i among the K streams that have been left 
undecoded (i.e. M>K streams were received, M-K were decoded by the 
hybrid method, and K streams — probably the most difficult ones — are 
••still to be decoded). We suggest that a very good measure -of computational 
complexity is the parameter 

E[ max min N. (K) ] 

^ 2<K<M l<i<K 1 

which may be interpreted as the -expected maximum number of decoding 
steps that need be done in the course of decoding of the entire hybrid 
block in any first incorrect subset. 

In Appendix 4 we find the rate below which the above quantity is 
bounded by a constant. The derivation is applicable to all channels 
symmetrical from the input (included in this class are all discrete 
channels derived through quantization of Gaussian additive noise chan- 
nels). In the next reporting period we will evaluate these limiting 
rates for some channels of interest. 



II-C. 


Development of Good Convolutional Codes 


A binary, rate R = 1/n convolutional code of constraint 'length •%> 
can be specified by n generators 

G (j) (D) = g< j) + g^ j) D + gpV + ... + g^ D V " X , j = 1,2,... n 

( 1 ) 

(jl) 

It is assumed that for at least one value of and , g^ 

(j 2 } r 

§nj_1 “ 1 • 


I and 


Every input sequence ig,i^, ....i^ can be represented by its D- 
transform polynomial 

Jk 


1(D) = i Q + I D ... + i D 


( 2 ) 


If by convention, i = 0 for t > K, then .the encoder outputs for such 
an input are the sequenpes 


X (j) (D) = G (j) (D)‘ I(D) = xj j) + x{ j) D + ... D^" 1 , j = 1,2,..., 


where 


x t^ = i t * g o^ ® i t-l g l^ ® ••• © i 


v-1 


i g (j) =¥i g (j) 
t-\rl-l S v~l / . t-m g m 


( 3 ) 
[ 1 ] 


m=0 


( 4 ) 


(1) 


A convolutional code is called systematic if G v 1 (D) = 1 . In that 
case (D) = 1(D) which is desirable for some applications. 


[ 1 ] 


denotes summation over GF (2) 


denotes summation over the integers. 
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The output sequences produced by the encoder are conveniently re- 
presented by the state space trellis ' diagram of the code given in 
Figure 16. The state S (t) of the encoder at time t is determined by 
the (v-1) preceding- information digits; 

•S(t) = ^ i t-l s ft-2’ ”* ,:L t-\H-P ^ 

where i *== 0 for t < 0 and t > k . There are '2 V ^ different 
t 

states for an encoder of constraint length v. For each state S(t) 
there are two possible values of S(t-HL), depending on whether i t = 0 
or 1 . The -state space diagram shows the possible transitions for 
t = 0,1,2, ... „ The branches i-n the diagram are labelled with the outputs 
corresponding to the transitions. The trellis of Figure 16 corresponds 
to the rate 1/2 code G (1) (D) = l-ffi 2 , G (2) (D) = 1 + D + D 2 . Since 

ap 

g = (L for at least one j- e 1,2, ...,n , the two branches diverging 

<v 

from a state cannot be identical. Similarly, since = 1 for at 

least one e 1,2, ...,n , the two branches converging into a 

state cannot be identical. 

The coder is initially started in state S(0) = 0_ = (0,0,..., 0). 

For -every input polynomial 1(D), there is a series of state transitions 

0 = S (0) — S (1) — S (2) * ... S (v+k-1) £ S (v+k) = 0 . Tracing 

the path corresponding to this series of -transitions through the trellis 
diagram determines the output sequence corresponding to 1(D). 

Let 0_* -denote the path £ > 0^ — £> 0^ ... >,0 > i.e. the 

path corresponding -to an input of all zeros. Massey and- Sain [6] have 
called a code catastrophic if there is an infinite path through the trellis 
that has no branch- in common with the 0 * path and whose Hamming weight 
‘is finite. The reason for this nomenclature is that in such a cotie a 
finite number of transmission errors may cause an infinite number of 



errors in the decoded information sequence 1(D) . Massey and Sain [6] 
have shown that a rate 1/n code is catastrophic if and only if 




.d. jV 


(D), G 2 (D),...,G n (D) 


>] 


f D 


( 6 ) 


for some non -negative integer, r. 

Let X = [x^^, denote the block of n output 

symbols at time t . The minimum distance of the code generated .by 
G^(D) , j - 1,2, ...,n is 


d m (G (1) ,...,G (n) ) 


v-1 


n 


mxn 

1(D) 

V 1 


W = 



t=0 


mxn 
1(D)- 

i Q =l j-1 


d^(X^^(D) mod D V )‘ 


( 7 ) 


where u^. -is the usual Hamming weight -operator. In the trellis diagram 
this corresponds to the weight of .the minimum weight path of .exactly v 
branches which diverges from the state <3 at t = 0 . Bussgang [7] , 
Lin and Lync [8] , and Costello [9] have explored methods for -con- 
structing codes with large minimum distance. 

The free distance of the code is 


d t (G 


( 1 ) 


,G <n) ) 


min 

1(D) 

J — 1 




t=0 


min «u(X^(D)) 

1(D) ^ 

i Q =l j=l 


( 8 ) 


In the trellis diagram this corresponds to the weight -of the minimum 
weight path of arbitrary length that diverges from the state 0_ at 
t = 0 and reconverges to the state 0 at some later time. For the 
binary symmetric channel, maximum likelihood decoding corresponds to a 
search for that trellis path whose Hamming distance from the received 
sequence is minimal. Since convolutional codes are linear, .free 
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distance is a good indication of maximum likelihood decoding strength of 

•the code at least for low crossover probabilities „ .Minimum distance is 

in the same way important for feedback decoding of convolutional codes. 

Moreover, the computational effort in sequential decoding seems strongly 

influenced by d 
m 

We have derived the following upper bound on d 
Theorem 1 

For all rate 1/n -convolutional -codes of constraint length v , 
the free distance is upper bounded by 

df < \ (V + [l°g 2 + !) 


The evaluation of d^ of an arbitrary code is quite complicated, be- 
cause one may 'have :to -‘search very deep into the coding tree to determine 
what dg is „ Although it is conjectured that the degree of the informa- 
tion-sequence 1(D) that achieves d^ is -only of -the order -of v log v , 
the .best general bound on that degree for rate 1/2 codes is [10] 

(v + log v) (v-1) + 1 . 

For the class of complementary codes that bound can be lowered to^^ 
v(v-3), but what is more important, a very efficient "search procedure 
determining d^ -exists that allows early identification of 1(D) se- 
quences that cannot possibly achieve d^ . Moreover, the d^ values of 
the best complementary codes are excellent and seen to grow as v . 


A rate 1/2 code is a complementary code if and only if 


,(l>. = g (2> Cl) CD . x 

■0 s 0 s \)-l E v-1 




39 


The generators may therefore be written as 


G (1) (D) = 1 + g x D + g 2 D 2 + ... g v _ 2 D v "' 2 + D v_1 


r2 - - 2 - v-2 v-2 

; (D) = 1 + g] p + g 2 D + ... g v _ 2 D V +D V 


where g^ is the binary complement of g^ i.e. © 1 

G (2) (D) = gJ 1 ^ (D) + D + D 2 + ... + D v ~ 2 

(X) 

Following Massey, we can use this relation between G (D) and 

2 

G (D) to reduce the number of adders .needed to implement the encoder. 
If the indices i of G^(D) are selected so that 


v-2 


v-2 


, V S 1> s z_ w n (i i ) 


i=l 


i=l 


then the encoding circuit is that of Figure -17. 

As mentioned the structure 'of .complementary codes allows construction 
of an efficient algorithm based on the stack decoding principle that deter- 
mines d^ „ 

The stack -is arranged according to -the values of a lower .bound W(t) 
on the weights of all possible .codewords corresponding to extensions of some 
given input sequence 1(D) of length t . The top of the stack is 
allocated to the .codeword of lowest weight. Since it turns -out that only 
sequences 1(D) of even weight can achieve d^ , the search considers 
inputs 

P(D) = 1*4D (10) 

to the convolutional code 



G (1) (D) = (1+D) G (1) (D) 

G (2) (D) - (1-ffi) g (2) (d) (11) 

Each entry in the stack contains the following information: 

a) u(t) = ^ p t ,P t •’ ’ * '’ P t-\H-1^ 3 t * ie current contents of the 
"encoder 

b) W(t) s the lower bound on the weight of codewords corresponding 
to extensions of P(D) considered, 

p 

c) C (t) , a count of the length of the last run of zeros in 
P(D) = 1(h) /(l-®) (P (D) can be discarded if C P (t) > v-2) 

d) C V (t), a count of the length of the last run of zeros in 

V Q = VjD + ... + v t D t where V(D) = l-ffi V " 2 /l-H) 1(D) (P(D) can 
be discarded if C V (t) > v) «, 

The following is then the algorithm 

I. Initialization: 

The 'stack contains one -entry 

U(0) = (1,0,..., 0), W(0) = -6 , C P (0) = 0, C V (0) - 0 

II. Regular operation 

1) If stack is -empty s go to 17, else continue 

P 

2) ."Eliminate top entry of stack, u(t), w(t), C (t), C^(t). If 

C P (t) = v-2 go to 16. 

3) U(t+l)<-^-(0,p t ,p t _ 1 , •••>P t _ srf2 ) * [ Zero extn *3 

4) If = 0 . C V (t+l)< 1 — C V (t) + 1, else C V (t+l) = 0. 

If C V (t+l) = v go to 9 , else continue. 

5) C P (t+l)^ — C P (t) + 1 

6) W(t+1) = W(t) + (BnCX^.X^j) - <o a (p t © Pt-^) where 
^t+l ,X t+l^ *" s t ^ e output "°f encoder. 



7 ) 

8 ) 


9 ) 

10 ) 


11 ) 

12 ) 


13) 

14) 


15 ) 

16 ) 
17 ) 


If W(t+1) > v + 2, go to 9, else continue. 

Insert entry U(t-HL), W(t+1), C^Ct+l), C V (t+l) in stack according 
to value of W(t+1) 

U(t+1]K (l,P t ,P t _ 1 , ...,P t _ v+2 ) [one's extn.] 

If P 


t-v+3 
V 


= 1 , C V (t+l)-< C V (t) +1 , else c v (t+l) = 0 


If G (t+1) = v go to 1 ., else continue. 


C (t+1) = 0 

w(t+l')< W(t) + c^<x£> , X^>) + 2 % <1 e P tTVt2 ) 

- V p t © P t . v+2 ) 

If u)(t+l) > v+2, go to 1, else continue 

Insert U(t+1) , u)(t+l), C P (t+l), C V (t+l) in .stack according 
to value of (u(t+l) 

Go to 1 


d^ = W(-t) Stop, 
•d f = -v+2 Stop. 


The free distance achieved by the complementary codes given in Table I 
is far in excess of any other known rate 1/2 codes. Figure 18 shows 
a comparison of the free distance of complementary codes with various 
bounds. It *is seen that the codes come quite close to achieving the 
upper bound of Theorem 1. Neumann [11] /has obtained a lower bound for 

t 

free distance, but his .bound is weaker than the usual Gilbert Bound for 

short -constraint lengths. It is seen that .the .complementary codes are 

far better than the Gilbert bound, which is of course a lower bound on 

d_ as well as d> . Figure 16 also contains the Costello^lower bound 
t m 

for time-varying codes. It should be ’pointed out that the Costello bound 
is asymptotic and does .not necessarily apply at short constraint lengths. 
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Figure .19 shows a comparison of the free distance -of -complementary 

codes -wi-th some -other known codes. Costello [9] , has devised two- • 

* 

■algorithms A6 (systematic) and A9 (nonsystematic) -'to cons true fc---c'odes 

with -large free distance. It Ms seen that .the -complementary codes do 

far better than either of these .codes. Also included in- the -comparison 

Ms the Lin-type -code [8] . Figure ’20 gives a comparison of .the .number 

of -steps taken by 'the -usual -stack algorithjn. (i.e, one that .would examine 

( 2 ") 

inputs 1(D) to G' (D) and G v ' (D) and would have -only the 'struc- 
ture properties of .general convolutional codes) with the : steps taken by 
the ’special algorithm -for complementary codes. The comparison is .made 
for the codes in Table II. It is evident -that the -special 'algorithm 
provides a tremendous advantage in computing the free distance of these 
codes . 

Figure '21 -shows that the minimum -distance of the complementary codes 
always -exceeds the Gilbert bound. At .most -constraint -lengths the minimum 

i 

-distance -equals .the minimum distance of the Lin-Lyne code. 

Some .complementary codes were used in -simulation studies for sequen- 
tial and maximum likelihood decoding .on a binary symmetric -channel. The 
performance -of -these -codes was consistently better than all other .known 
codes [12]„ 

The .motivation for this work was to look -for methods of constructing 
convolutional codes with .large free distance. The results are partially 
successful -since _ we found a good class of rate -1/2 codes whose free 
distance -exceeds the free distance -of any other known codes. However 

f 

such codes were found only for -v < 24 and there .is no evidence .to ’show 
whether .good codes do or do not -exist for longer constraint -lengths. 

The .major problem in searching for long codes i-s .that the amount of com- 
putation needed to calculate the .free .distance grows at least exponentially. 



We were able to utilize the special -properties -of complementary codes 
to cut down on the amount of computation. 

Unfortunately there does not appear to be any simple way to generalize 
these codes to rates other than 1-/2 . 


V 

Gen. (octal) 

^free 

d . 
min 

wt. 

3 

5 

5 

3 

2 

4 

13 

6 

3 

3 

5 

31 

7 

4 

3 

6 

61 

8 

4 

3 

7 

121 

9 

5 

3 

8 

211 

10 

5 

4 

9 

503 

11 

6 

4 

10 

1065 

12 

6 

5 

11 

2415 

13 

7 

5 

12 

5121 

14 

7 

5 

13 

12043 

15 

7 

5 

14 

24421 

16 

.8 

'5 

15 

'51303 

17 

7 

7 

16 

120643 

18 

8 

7 

17 

352411 

18 

9 

8 

18 

425551 

20 

8 

9 

19 

1411041 

20 

9 

6 

20 

2734605 

20 

10 

11 

21 

5011303 

22 

9 

8 

22 

11047441 

22 

10 

9 

23 

22517023 

24 

10 

11 

24 

51202215 

24 

10 

9 


Table I. R = 1/2 Complementary Codes 





II -D„ Application of Bootstrapping to Maximum Likelihood Decoding 

of Convolutional Code's 

We are trying -to see if the basic idea of .bootstrap hybrid sequential 
decoding can also be helpful to the Viterbi decoder. It will hopefully 
reduce the decoding complexity £hat grows as 2^ ^ in the Viterbi 
algorithm) for .a given probability of error and transmission rate D. 

We have completed a Fortran program whose .basic idea is as follows: 
There are .m-1 convolutionally encoded .information streams and their 
exclusive-or sum forms the m th parity stream (the BSC is implied) . 

After reception .the channel state -stream is found in the usual way. 

Viterbi .decoding of the first stream is undertaken whose- likelihood 
values are based on the -state information. The likelihood .function 
of .the decoded path is then examined and with its help reliable sub- 
intervals of the path are determined (e.g. a subsequence of the decoded 
sequence is considered reliable if it -corresponds to a consistently 
rising likelihood) „ These are substituted for the corresponding por- 
tions -of the received sequence and the -state sequence is accordingly 
recomputed. The -second stream is then decoded and its reliable sub- 
intervals determined. The transmitted digits falling within these 
- subinter va 1-s replace the received digits and the state -sequence .is 
again adjusted. This work continues in a round robin fashion as long, 
as re-decoding of received streams results in an enlargement of the 
reliable • subintervals. When no such enlargement occurs for any of 
the m Streams computation stops and the -paths decoded -last are 
supplied to the user. 

The main problem in running this algorithm is the finding of cri- 
teria that could be used to determine the -reliable subintervals. We 
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have written a program that collects statistics on the behavior of the 
likelihood function of decoded sequences when -it corresponds to correct 
and incorrect information' supplied to the user. The criteria will -of 
-course be more -stringent the smaller the code constraint length and the 
larger the channel error probability. We hope to report some initial 
.results soon. 
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la: Encoding circuit of a single convolutional generator. 
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Circuit that recovers information digits s n ^i» s n _2’ * *'* ’ S o 


from parity polynomial P n (D) and parity digits p 
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n-2 
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2b: 


Circuit that obtains P 1 ? ^(D) from P^CD) and x. . . It 

i i i,n-l 

can also be used to recover the information digits S n _^,..,,s o 
from P”(D) and x i>n _ r x i>n _ 2 X i, 0 ’ 

Feedback circuit that obtains s o s **” ,S n-l J ^ rom x ]_ r_i‘ 

Initial contents of the shift register .are 0's. 


3: Empirical distribution of the speed factor necessary for 

rudimentary bootstrap hybrid decoding of R^g^ ” 0.4-5, „m - 10 
over' a BSC -with p = 0.056. 


4: Empirical distribution of the speed factor necessary for 

rudimentary bootstrap hybrid decoding of = 0.045, m = 10 

over a BSC with p = 0.07. 


5: Comparison of performance characteristics of sequential decoding, 

Falconer's hybrid decoding, and bootstrap hybrid decoding over 
the BSC, The experimental points denote simulations , at R = 0.5 
referred to in Table 4. 


6: Comparison of performance characteristics of sequential decoding, 

Falconer's hybrid decoding, and bootstrap hybrid decoding with 
the capacity of a Gaussian channel with binary inputs and outputs . 

7: Plots of H comp /C, R^/C, 4 0T (D/C, and R^CD/C as a 

function of SNR per transmitted digit in dB's for the binary 
quantized gaussian channel with binary inputs. 

8: Upper and lower bounds to the Pareto exponent y for hybrid 

decoding as a function of SNR per transmitted digit (dB) when 
the convolutional rate R = 1/2 . The sequential decoding 
Pareto exponent o is provided for comparison. 
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ternary and octal quantization when state stream contains maxi 
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The parameter of Fig. 11 when state stream is binary. 
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B00T(1) vs„ SNR per transmitted digit (dB) curves for binary, 
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the channel state stream is binary. 
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The trellis state diagram for the code G v '(D) = 1+D+D , 

G 2 (D) a 1+D 2 . 

A simplified encoding circuit for complementary rate 1/2 codes 

Free distance of 'best complementary codes compared to the 
best available bounds. 

Free distance of complementary and other best codes. 

Computational effort necessary to determine free distance of 
an ordinary stack algorithm and of .the ’special algorithm 
utilizing the structure of complementary codes.. 

Minimum distance of the highest free distance complementary 
codes compared to -the Gilbert bound and to the minimum dis- 
tance of .the best available codes. 
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Figure 2b: Feedback circi 
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Empirical distribution of the speed factor necessary for rudimentary bootstrap hybrid decoding 
of Rjrgm = 0.045, m = 10 over a BSC with p =» 0.07. 






Figure 5: Comparison of performance characteristics of sequential decoding. Falconer’s hybrid decoding, ond 

bootstrap hybrid decoding over the BSC, The experimental points denote Simula tions at; F-0,5 vjj 

referred to in Table 4. 
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Figure 6: Comparison of performance characteristics of sequential decoding. 

Falconer’s hybrid decoding, and bootstrap hybrid decoding -with the 
capacity of a Gaussian channel with binary inputs and outputs . 
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binary inputs. 


PARETO EXPONENT 
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Figure 8: Upper and lower ’bounds to the Pareto exponent y for hybrid decoding as 

a function of SNR per transmitted digit (dB) when the convolutional 
rate E = l/2. The sequential decoding Pareto exponent a is provided 
for comparison. 
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Figure 10: C and R com p vs. SHR per transmitted digit (dB) for binary, quarternary and octal optimal 

quantization of the Gaussian channel with binary inputs. 






bits 



. dB=IO log, 0 E %| 0 ' 

Figure 12: The 'parameter of Figure 11 when state stream is '"binary. 
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Figure 18: Free distance of best complementary codes , -compared to the best 

available bounds. 
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Figure 19: Sf'ee distance of complementary and other "best codes. 
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Figure 20: Computational effort necessary to determine free distance of an 

' ordinary stack algorithm and of the special algorithm utilizing 
the structure of complementary codes. 
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Figure 21: Minimum distance of the highest free distance' complementary codes 

compared to the Gilbert hound and to the minimum distance of the 
"best available codes. 
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III. REPORT ON PHASE II 


III-A. Tree Encoding of Sources with a Fidelity Criterion 
III-A-1. Experimental Comparison of Two Encoding Algorithms 


The most general theoretical formulation of the data compression 

problem was provided by Shannon in 1959 in his paper "Coding Theorems 

for a Discrete Source with a Fidelity Criterion. "[ 1] He enlarged there 

2 

on his 1949 source coding ideas referred to in the literature as varia- 
ble length source coding and block source coding. Concisely stated, 
Shannon's results are as follows;, Let a memoryless source of alphabet 
A = (0,1, ...,a-l) governed by the probability distribution Q(z), 
z s A be given. Let an approximation of the source outputs in the 
reproducer alphabet B « (0, 1, . . . ,b-l) be desired (in prafctice b < a) 
with an attached additive per letter distortion criterion d(z,.z‘) defined 
for all pairs z e A,z ,®B. (i.e. the distortion between sequences 

z n = z-., ,...z and z 11 = z, ,„.,.z is defined to be d(z n ,z n ) = S 

j __ 

d(Zi,Zi)). Let T n (z n ) be an Encoding function that assigns some re- 
producer sequence z n to each possible source? sequence z, n . The rate 


of the resultant code is defined to be R = log 


/n where 


^n 


denotes the number of sequences in the range of ( 5 . Shannon shows 
the existence of a rate distortion function R(D) [whose shape depends 
on Q( ) and d( , ) only] that has the following properties: 

a) for all n and all codes , if R < R(D) then the expected 
distortion E [— d(z n f'f r ' (z 11 ))] > D . 

b) for R > R(D) there exists a sequence of codes of rate 

logj| r ^lj /n < R(B> such that E[^- d(z n ;“^ (z* 1 ))] • 



In recent years a lot of work has been done generalizing the above 

resuits to a broader class of sources, evaluating the performance of 

existing systems relative to the achievable optimum, and developing 

methods for evaluation of the R(D) function. The first -consideration 

3 

of the actual coding problem was undertaken by Jelinek who showed that 
the sequence of -coding functions’^* can possess the above desirable pro- 
perties even if it; is restricted to generate tree codes (instead of block 

-codes to which Shannon's theorem applies). It was hoped that a tree 

T 

code structure would facilitate the development of a computationally 
feasible encoding algorithm. 

Our work concerns the performance of such algorithms as applied to 
the restricted class of binary symmetrical sources [Q(0) = Q(l) ^ 1/2 , 
a = b = 2 , d(0,0) = d(l,'l) = 0, d(0,l) = d(l,0) = 1] . The algorithms 
themes Ives are, however, completely general. An example of a tree code 
is given in Figure 1. The various codewords are the -sequences associated 

5 

with the 2 =32 different paths of the tree, A path is specified by a 

5 

binary map sequence s_, which determines at each node level if the upper 

5 

(0) or the lower (1) branch was taken. Thus the map sequence s^ - 01101 

/slO 

corresponds to the codeword 3 = 0011101100. The rate of the code of 

Figure 1 is R = = 1/2 .so that the theoretically optimal achievable 

average distortion is D = .11 . Figure 2 shows an experimentally de- 
rived, ultimate capability of specific codes (believed to be near optimal) 
of constraint lengths 5,7,10, and 14. The curve does seem to indicate 
that the ultimate performance of D = .11 will be achievable with codes 
of ’sufficiently long -constraint length. The simulation was carried out 
with the help of a straightforward modification of the Viterbi algorithm 
that necessitates 2 V ^ steps per encoded source digit pair. The top curve 
in Figure 3 then gives the corresponding distortion performance as- a 
function of the number £f encoding steps. The algorithm compares the 
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beginning subsequence of length ]> v of the source outputs with the 
difference sequences corresponding t<? the 2 V Initial .paths of the code 
trellis (see Fig. 14 of Section V). "Each state at depth V of the trellis 
has two such paths entering it. For each state, the -one of these two 
paths whose distortion from the source subsequence is least is retained 
and the other is eliminated. Extensions of length v+1 of the retained 
paths are then compared with the initial course subsequence of length 
2(v+l) and the elimination propess is repeated at each trellis styate 
of depth v . This continues until a preassigned depth r in the 
trellis has been reached. Then the best of the 2 V "live" paths is 
selected to represent the source output sequence .of length 2^ . 

4 

The next algorithm evaluated is based on the stack principle. 

Let D* be the per letter distortion desired by the user. To be real- 


istic (see the previously quoted results) we must have R > R(D*) . 

A /» A £ 

Define a metric distortion function d*(z,z) = d(z,z) - D* . Then 
will be a ■ acceptable approximation of a source sequence z 1 if and only 


if d*(z.,z„) < 0 (we assume that the code is indefinitely extensible, 

j=l J 3 

i.e. that the number of levels in the tree is practically infinite). 


Suppose the sequence (n large) was .generated by the source, let 

d*(s^) denote the metric relative to z n corresponding -to the last 

branch of the path [e.gj d*(101) = d*(z^,l) +d*(zg,0) and 

d*(100) = d*(z^,0) + d*(z^,l)], and let D(s^) be the cumulative metric 

j . ' 

along the path s'* , D(s^) - where are the initial 

r^f i=l. 

subsequences of length i of (i < j). The stack will contain dif- 
ferent paths and their cumulative metrics D(a^), and will be 

f 

arranged in ascending order of the latter (i.e. at the top of the stack 

there will be that path whose D(s^) is least). 

^ i V* 
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1. At the beginning of the decoding process, the paths s =0 
2 

and Sj ~ 1 are arranged in the stack according to the values of 
D(0) and D (1) . 

2. The encoder checks -whether the path on top of the stack 

is such that D*(s^) < 0 . If so, go to step 4, i.f not, go to step 3. 

3. The top -entry [s^,D(s^)] is eliminated from the -stack, 


the branch metrics d*(s J 0) and d*(s J l) are computed, and two new 
entries [s j 0, D(s j 0) = D(s j ) + d*(s j 0)] and r a Ji j D(s j l)=D(s^)+ 


d*(s J l)1 are insetted in the proper location into the stack. Go to 2. 

21 21 

4. The subsequence z, is encoded into the codeword z J that 

corresponds to the path „ The stack is cleared of all its entries 

and encoding of the sequence z 2 j+l 3 z 2j+2* * " ° starfcs with the insertion , 


of two new entries [s^0,D(s“*0) = d*(s^0)] and [s^l,D(s^l)=d*(s“'l) ] 
in their proper order into the -stack. Go to 2, 


The bottom -curve in Figure 3 is a plot of average distortion achieved 
as a function of the average number of -steps necessary to encode a source 
digit pair when the code of constraint length v = 14 whose ultimate 
performance is D = .116 was used (see Figure 2). The performance 
curve for the stack algorithm dominates that corresponding to the modified 
Viterbi algorithm. 

The stack algorithm is readily generalizable to tree codes of rate 
k k 

R = — with 2 branches leaving each node and n digits -per branch. 

Its suitability is determined by the average number of steps necessary 
to encode a source digit. 



III-A-2. Theoretical Analysis of the Stack Encoding Algorithm 

Our analytical work with the stack algorithm has divided into 
two efforts, finding equations in relevant variables and approximating 
solutions to these equations. Presented Here is the result of the first 
effort . 

To facilitate analysis, consider several component processes, all 
running on the copies of the same tree and source. These will combine 
to form a stack encoder. Let a > 0 and , b < 0 . As usual, let a 
node .extension include scrutiny of the -d branches extending from a 
common parent node . 


Process 1 Suppose an entire tree is explored by the stack 

algorithm until either the top metric in the stack exceeds 
a_ or falls below b , whichever comes first. Define 
N(a,b) to be the number of extensions in the first of the d 
subtrees stemming from the tree's root node. 

Process 2 In this -process only the first subtree is explored 

in the ’Stack, again until the stack top exceeds a_ or falls 
below Id . Let N*(a,b) be the number of extensions. 

Process 3 Here let subtrees 2, ...,d be explored, until the 

stack top exteeds a_ . b is .effectively . If 
0 > b^ > bg > . . . and the possible top stack minima in 
this process, let 


( f)(b. ) 


if the stack top falls to b^ and 
no further 


0 


otherwise 



Concerning N* and N , certainly 


N(a,b) < N*(a,b) (1) 

since in Erocess' 1 searching in the first subtree may be terminated 
by events .in the other d-1 subtrees. Process 1 nearly constitutes 
the stack algorithm, and N(a,b) is closely related to its computation. 
In fact, defining N,j,(a,b) to be the computations in a stack encoder 
which "gives up" when its stack top falls .below .b , 


EN T (a,b) = 1 + d EN(a,b) 


( 2 ) 


In (2), the unit term on the right represents the ^initial 
computation needed to reach the d subitree structures. The d-factor 
follows from the statistically IID behavior of the d subtrees. To 
reflect exactly the four step stack algorithm of the previous section, a 
i s set arbitrarily close to zero and b is reduced to -» , so that 
the algorithm stops only when its top- path metric exceeds zero. 

To pursue this further and arrive at an equation in EN,^ , -we 
prove the following lemma about N and N*: 

Lemma 


N (a ,b*) 


N*(a,b.) 


b. > b* 

i 


<!>(b.) + N* (a,b*) 


$( V 


b. < b* 
r — 



Proof : Case I. <S>(b^) = 1 for b^ > b* . In Process 1, no part of 

subtree 1 can be examined whose path metrics fall 'below b^ . 
On the other hand, if Process 2 with b = b^ can terminate by 
its stack top rising above a , Stack 1 must hever have fallen 
to b^ . Overall, then, Process 1 examines in subtree 1 pre- 
cisely what Process 2 does. 
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Case II . $0^) = 1 for b^, < b* 

The all -subset 1 paths with metrics between a and b i are examined 
in both Processes 1 and 2. q ED 

An expectation operation on (3) now yields, 


EN (a 


,b) = EN*(a,b.) PjCK) +-'EN*(a,b*) -P^O^) < l * * 4 > 


b. > b* 
x 


b.<b* 

x— 




where Po(b) = Pr [Top Stack Minimum in Process 3 
4 ' is b 


By a few- more maneuvers, we can change (4) into a function of one ex- 
pectation only, EN' . We can write immediately, 

r 


EN*(a,b) = 



EN T (a-X,b-\) P(X) 


(5) 


where 


P(M = Pr (a given branch had incremental metric xj 


and N^,(a,b) =0 if a < 0 or -b > 0 . Now combining (2), (4), and 


(5), we get 


EN™ (a,b*) = 1 4- d 


r 




PgCb.) \ P (X) EN T (a-X,b.-X) 


V b i > b * 


+ 


P 3<V 


l .b.<b* 

x— 


P(X)EN T (a-X,b*-X) 


( 6 ) 


If P 3 (b^) were known, (6) would constitute a linear difference 


equation in the .unknown _EN T (a,b). Standard solution -methods could then 



be used to obtain a tight bound on EN T (a,-oo), the amount of computation 


necessary for stack encoding. Unfortunately, P^(b^) is itself a solution 
to the non-linear difference equation. In fact, let 


G(a,x) = Prob 


In Erocess 2 with b = the 
top of the stack falls below 
the va lue x 


Then 


G(a,x) = 


P (X) G (a-X,b-X) + P (k) 


a>K>b 



\<b 


(7) 


( 8 ) 


and P^ (b^) is related to G(a,x) by 


G(a,x) = P^(b t ) (9) 

b. < x 
x — 

We do not .know how to solve (8), except numerically. In th£ near future, 
we will do '-just that, and we shall appl-y the result to (6) so as to gain 
a better feeling -about the behavior of EN^(a,~co) . 



III-A-3. Another Tree Encoding Algorithm and a New Source Coding Theorem 

In this section, we describe 'a source encoding algorithm for use 
with tree codes. Tree searching does not proceed in a stack manner as 
in the preceding section, but instead uses two lists of temporary path 
hypotheses . 

Assume code words for encoding a binary digit IID source have been 
arranged in a tree structure. The tree has rate R = loggd/n, -with d 
branches stemming from each node and n source approximating binary digits 
on each branch. The object of the encoder is to find a path of branches 
through the tree, the digits of which approximate the source sufficiently 
closely. To measure distance between the source output and various paths, 
we use the Hamming measure 

% 

d (£>th = 'y [i - &(v z i)] (i) 

i-1 

is a source sequence, z*^ is an hypothesized path, and 5 is the 
Kronecker delta function. 

The encoder • operates with two lists of tree path hypotheses in 
arriving at one path for .release to the user. The main list functions 
as a temporary "scratch pad," and the auxiliary list is a repository 
for "good" paths. Goodness- • of paths in these lists is judged by a 
path metric that depends on path length as well as distortion, 

|i(aV « 9D* - d(z$,z9) (2) 

Here .9 is the length of z$ (note that J? must be a multiple of n) . 

D* is the distortion per encoded source digit desired at the end of 
encoding, and D* > A(R), the inverse rate distortion function relative 
to (1) and the source. Eqn. (2) is justified in earlier reports on 


the Jelinek stack encoder. 



With this path metric in mind, we define two freezing barriers, one 
at metric a <>0 , the other at b < 0 . Further extension of pa'tihs 
whose metrics rise above -a_ will be iro.zen temporarily and the paths 
removed to the auxiliary list. Paths falling below b, normally will 
be dropped forever — "permanently" fro;zen. 

Specifically, the algorithm works as follows: 

Step (1) Starting at the tree root node (which is assigned the 

metric zero), paths are extended in the main list until all 
root node descendants crash a freezing barrier and are frozen. 
Paths which rise above the a * "barrier .are placed in the auxiliary 
list in order of their length, the longest being on top. The 
longest of paths frozen at the b-barrier is also saved. 

Step (2) When no paths remain in the main list, attention turns 

to the auxiliary list. In this "good" list, the final node 
of the longest path (which is on top of the list) now becomes 
a new root node (metric value 0 is assigned to it) for the 
main list, and the encoder executes again Step (1). The rest 
of the auxiliary list is retained and a-barrier crashing paths 
keep being added to it in the proper order. 

Step (3) If there.'are no paths:'in the auxiliary list by the end of 

some execution of Step (1), the saved longest path frozen 
at the b-barrier is. chosen to supply the new root node and 
again metric value 0 is assigned to it. The -encoder then 
executes Step (1) again. 

Definite encoding of .the source sequence .takes place whenever step (3) 
is involved, since only one path is then left. Some stopping rule must 
also be specified that will go into effect if the time elapsed since the 
last invocation of Step (3) is large (as hopefully happens often) . 
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The -analysis of this algorithm is an interesting one. However, the 
scheme has two practical advantages: if b is not too negative (which 

it need not be if D* is not too close to A(R)) then the main list can 
be allowed to be quite small. Also, the stack algorithm described in the 
preceding two sections has a start-up problem which is mostly avoided 
here: when encoding takes place there, only a single root: node is pro- 

vided and the patte that emerge from it might all approximate the source 
sequence quite badly. 

To analyze and understand the two-list encoder better, we can view 
Steps (1) and (2) in terms of a branching process. In the language of 
Feller, Pg. 293, let the paths that are frozen at a during Step (1) 
be the particles of the branching process, so that the auxiliary list is 
actually a list of untried progeny. With each particle associate also 
the main list computation to follow. Corresponding to the tree root 
node and the first execution of Step (1) is the branching process’s 
initial particle, and paths which now crash the a-barrier become the 
first generation of particles. The first generation gives rise to the 
second according to some probability distribution independent from particle 
to particle and determined by the statistics of the main list. We can 
think of the succeeding progeny as occurring in generations, even though 
the encoder does not necessarily exhaust all auxiliary list "particles" 
on one generation before going to the next. The branching process 
either terminates by extinction of progeny, or goes on forever. In the 
former instance. Step (3) is invoked to start a new process. 

Our -analysis begins by finding the average computation necessary in 
the main list. We assume both lists are of infinite length, so that the 
parameters of interest are -the freezing barriers (a,b) and the hoped 
for distortion D*. It concludes by using the branching process analogy 



8l 

to prove the encoder can achieve any distortion D* > A(R) , so long as 
Id is less than some b* which depends on D* and a . 

Main -List Computation Related to (a.b 1 ), D*, and -R 

In the main list, let 

N^ = Number of paths frozen at a -barrier 

= Number of paths frozen at b-barrier 

N = Number of paths remaining forever unfrozen 

03 

We state immediately, but without proof, that the expected value of N^ 
is zero under -proper conditions: 

Theorem 1 For a tree of rate R = loggd/n used to encode a binary IID 
source with respect to the Hamming distortion measure. 

lb - a) < tt/uj implies EN = 0 , 
where u) = («(R,D*) and a) > 0 for all -D* ,e (j\(R),l/2) 


(The proof follows from difference equation methods explored first by 

g 

Zigangirov in a sequential decoding context. The function u)(R,D*) 
is made specific in Appendix 5) . 

Assuming EN^ = 0 , the expected number of main list paths frozen 
at the end of Step (1), EN, is 

EN = E[N a + N b ] (3) 


A short derivation shows that the expected number of extended branches 
present in a tree containing EN paths is related by 



E[branches ] 


( 4 ) 
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Customarily, a "Computation." is meant to include the scrutiny of d 
branches from their common -parent node, so that 

E[ Comps ] = ^ (5) 


Eqns. (3), (4), and (5) measure in various ways .the work done in the 
main list. 

It remains to estimate EN and EN, . Between these, EN is 

a b ’a 

of crucial importance to a coding theorem because it corresponds to the 
expected number of descendants of each partic-le in the analogous branch- 
ing process. Parts of the following proof are inspired by ideas used 
by Gallager,^ again in sequential decoding analyses. The proof appears 
in the Appendix. 


Theorem 2 


Under the hypotheses of Theorem 1, |b - a) < tt/<d 


implies 

EN 

a 


-a 

r 

sin a) 

a 


COS (J0 a COS (0^ \ 

sin a) sin' to, 

a b / 


■(6a) 


'-b 


COS 0) 


EN, --4 / -t— 2 

b — -“sin / sin u> a sin tUj. 


(6b) 


0 ) and r are functions of D* and R, and are found as -shown in 
Appendix 5. co 0 as /\(R.) , and r is typically near (1-D*)/D*. 

•A careful look at (6) reveals that as )b - a] tends .to n/u) , both 'EN 
and EN^ tend to infinity. In fact, given an a_ one may choose b to 
make the right hand side of (6a) precisely unity. In this way, RjD*, 
and a_ , 'with the aid of Theorem 3, specify a minimal b necessary for 
the -encoder to achieve - D* . In preparation for Theorem 3,'we restate 


this as a 
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Corollary For any given a < tt/w, there exists b* such that if 

|b-a\ < tt/uo 'and b < b* , then EN^ > 1 . 


We feel confident that further information "about N is available 
from these methods. For instance, higher moments of N may be found 


in a way similar to the proof of Theorem 1. 


Sample calculations have been mhde for R = 1/2 and D* either 
0.125 or 0.111. £±( 1 / 2 ) is 0.110. 




D* = 0 . 125 

D* = 0.110 




ou 

0.789 

0.206 




r 

6.46 

7.98 




tt/cd 

3.98 

15.25 






Table 1 





Sample Values of r & uu 





D* 

= 0.111 

D* = 0 

.125 



EN 

EN, 

EN 

EN, 

a 

b 

a 

b 

a 

b 

O'. 5 

- 2 

0.288 

13.4 

0.409 

17.1 

0.5 

- 3 

0.310 

80.4 

0.746 

280 

0.5 

-14.5 

1.06 

2.4xl0 13 

09 

oo 

0.5 

-15.25 

CO 

CO 

CO 

CO 

P.25 

- 3.0 



0.805 

96,5 

0.25 

- 3.5 

« i 


1.28 

737 

0.25 

- 3.73 



CO 

CO 

0.17 

- 3 

0.669 

29.9 r 



0.17 

-14.5 

0.921 

3.5xl0 12 

• • • 1 ■ 

1 i « « 

0.17 

-15.08 

00 

00 




Table 2 

- EN and EN vs. a.,b, and D* 

cl D 
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Coding Theorem Proof Using the Two-List Encoder 

We now prove the source coding theorem for our present source and 
distortion treasure using the Two-List Encoder — that is, we show the 
encoder can achieve any distortion greater than ^\(R). The proof uses 
the fundamental theorem of branching -processes (see Feller 5 , Pg. 297), 
with the branching process analogous to the encoder. 

Theorem 3 Under the hypotheses of Theorem 1, whenever EN a > 1 

the expected per source digit distortion produced by the Two-List 
Encoder is at worst D* , for any D* > ^(R) . 

Proof: Let an encoder cycle run from the extension of a root 

node to the invocation of Step (3) . If the longest b-crashing path is of 
length L then the total distortion for this cycle is LD* - -b . -Let 
N(M) be the number of cycles it takes to encode a source sequence of 
length M [the last of 'these cycles may be completed at some sequence length 
that exceeds M] . .The distortion per source .digit is then upper 

bounded by 

b 

so that the -expected distortion is 

E[D m ] < D*~| E[N(M)] < D* - | E[N(«)] . 

But by the fundamental theorem of branching processes, EN .> 1 implies 

a 

that a cycle ends with infinite progeny (i.e, never ends) with probability 
■ T| > 0 . 

Hence 

CO 

E[N(«)] = k(l- n) k-1 n = | 


k=l 
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and 

E[D ] < D* ~ 

L M J - TM 

The theorem is thus proven by taking the limit as M — ^ 00 . QED 

Theorem 3 contains no necessity for a large a_ , so that a sensible 

encoder would place a. as close to zero as possible. With ‘.this in mind, 

sample calculations were carried out for R = 1/2 and D* = 0.125. 

For a code chosen at random from the usual random ensemble EN ~ 40 , 

and b is required to be about -3.1. If D* is lowered to 0.111 

12 

(very near .A(R) !), EN ~ 10 and b ~ -14.7. 

The large literature on branching processes suggests more results 
can be -obtained by these methods . We hope in the near future to obtain 
results concerning the auxiliary list size and the -computation per 
encoded source digit needed in the main list. 

The theorem is readily generalizable to other finite distortion 
discrete memoryless sources . 
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ril-B. Permutation Codes as Source Codes 


Permutation codes -are a class of codes originally described by 

Q 

Slepean for use as a method of achieving reliable transmission of digital 

data over an additive Gaussian noise channel. One variation of these 

codes was considered by Dunn for the vector quantization of data 

from a time discrete, Gaussian, memory less source. In this study, we 

have extended the work of Dunn in several ways: namely, (1) optimizing 

the parameters of the codes, (2) considering a second variation of the 

codes, (3) developing an efficient encoding -algorithm for the codes, 

and (4) deriving some special properties associated with the codes. 

The basic idea is that of block coding (or block quantizing) for 

a time discrete source. The source is thought to emit a sequence of 

statistically independent, identically distributed, random variables 

2 

x x 0 ... each of zero mean and variance a . We will be concerned with 
1 2 

encoding the first N symbols, x^ = (X^jXg, . . . ,X^) . A set of M 
N-vectors, i - 1,2, ...,M, are chosen as code words and the source 

output vector is represented by the closest (in accordance with some 
distortion measure) codeword. The rate of the code is defined to be 


R 


loggM 

N 


( 1 ) 


and the resulting average distortion D is defined as 


D - | E ^min d (X (N) , C^ N) ) j 


where d(X^ N \ cf N ^) is the distortion between the source vector X^ 

„ , .th u , „(N) 

and the i codeword C; . 

L 

Permutation codes are codes for which the M codewords are chosen 
in a particular manner. Two different types of codes are considered and 



are termed Variant I and Variant II codes as in Slepian. Their- desc- 
riptions follow: 


Variant I Codes : Let the first codeword } be chosen as the N-vector 


' n l ^ 't— n 2 


r (N) 


^~ n k — 

j ^^ 5 • • • 


where ^2* * * * ’ are ^ rea ^ num b ers such that 


Uio 


^ > Si 2 


n l + n^ + . . - + n^ = N 


where the n^ are positive integers. The other -words of the code are 
chosen as all distinct permutations of the elements of the first codeword. 


There are a total of 


M = N! 


codewords . 


Variant II Codes : The first codeword ' is again given as the N-vector 


'V — n i > ^ n 5 


_(N) 

• • •>^5^2’ • • • ’^2’ * * 


=^ n L 


» • • * > 


where now the jj, are k nonnegative numbers such that 


^1 ' > ^2 " " M'jj. ® 


The .other words of the code are chosen by assigning a sign (positive or 
negative) to each component of the first codeword and by permuting these 
signed components in all possible ways . The number of codewords in the 
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code is now: 

M = 



K 


n. 

a. 


i = 1 


t 


where 

fN 


h = 


^N-n k 


^k 

^k 


> 0 

= 0 


(7) 


(S) 


The encoding procedure for block codes is in general a very complex 

(N) 

procedure. In its worst form, each source output vector X must 

(N) 

be compared with each of the M codewords C; , i = 1,2, ...,M and 
is then represented by that codeword which attains the minimum distortion. 
For very large M this is a horrendous task. Permutation codes are - of 
particular interest in that they lead to a relatively easy encoding 
algorithm for distortion, measures of the form 



,X W ) 


N 



i=l 


(9) 


where f(l'af) is a nonnegative, monotonic, nondecreasing, convex .upward 

fctl 

function of | a I and is the i component of' the vector chosen 

to represent X^^ . The encoding algorithm which encodes X^^ into the 
code vector which minimizes the distortion is described below for Variant I 
and Variant II codes. The proof that this encoding algorithm minimizes 
the resultant distortion is given in Appendix ,6, part A. 
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Encoding Variant I Codes : 

(N) 

1. Replace .the largest components -of X with .y, 

(N) 

2, Replace the next Ug largest components of X with -y^ 


« « » 

(N) 

K. Replace the smallest components of X with y,^ . 

result is a permutation of the codeword given in Equation (3) and is 
indeed a codeword in the code . 


The 


Encoding Variant TI Codes: 

(N) 

1. Replace the n^ largest, -in absolute value, .components of X 
with either +.y,^ or ry,^, the sign chosen to agree with the sign of 
the component it replaced 

2. Replace the next ng largest, in absolute value, .components of 

(N) ' 

X with either + y^ or -y,g , again the sign chosen to agree 'with 
the sign of the component it replaced. 

« e « 

(N) 

K. Replace the n^ smallest in absolute value components of X 
with either + y,^ or -y,^ , again the -sign chosen to agree with the sign 
of the component it replaced. 


It should be noted that for identically distributed, statistically 
independent source outputs', all codewords for the Variant I codes are 
equally probable.. .‘If’ the source distribution is symmetric about zero 
the same is true for Variant II codes. „ 

A commonly used distortion measure is "mean-square error" dis- 
tribution whereby f(lo'l) of Equation (9) is given by 

f (la |) = a 2 


( 10 ) 
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It is shown in Appendix 6, part B that for a given choice of n^,^, . . . , n^, 

the best .choice of the parameters t ^ e sense minimizing 

the mean-square error is given by the following equations: 


Variant I 


n n +n n +. . .+n . 

12 j 


^ ‘ t A 


[x (i) ] j = 1,2,...,K (11) 


i^^+n^-h. . .+n._^+l 


1 r (i')') th 

where E i X v ^ > is the expected value of the i largest of N inde- 
pendent random variables : i.e., X^ > ... >X^ . 


Variant II 


n^ni. . ,+n , 

12 j 


p . = — 

J n , 


[|K1 tt> } 


( 12 ) 


i=n 1 +n Q +, . ,+n . n +l 
l & j -i 


where E X f 15 } is the expected value of the absolute value of the i 
largest of N random variables. That is, the absolute value of N 


th 




random variables are ordered in terms of their .magnitudes and E 

th 

is the expectation of the i largest. For a mean-square error distortion 
measure, and for the choice of given by Equations (11) and (12) 

the resulting average distortion is given as 

K 


~ 2 1 
D = "M 


j.=l 


2 

n . u, . 

J J 


(13) 


The rate of a permutation code for a given N is a function of the 
choice of the groupings n ;iy n 2 s 4 ’ * ’ n k . The highest rate codes occur 
for n^ = 1 , for i = 1,2,..., K = N . (For Variant II codes, in order 



to achieve the largest rate we have the added restriction that ^ > 0 „) 
The maximum rate is 


log 9 N! 


N 


r max 


1 + 


log 2 Nl 




N 


Variant I 


Variant II 


(14) 


and the corresponding mean-square error distortion is 



(15) 


For a Gaussian source with unit variance, the summation 

X e . 

i=l 

is tabulated by David et al"^ for values of N up to 400 . The resulting 

distortion for .maximum rate Variant I codes is found to be much greater 

12 

than the corresponding distortion given by Rate^-Distortion theory; 
namely 


D 



(16) 


In feet, the resulting -performance is inferior to that of an ordinary 
scalar quantizer with 2 equally spaced quantization levels. Thus 
we 'see that if such codes are to be of value, we must have a method for 
the judicious choice of the groupings n^n^, ...,n^ . 



Many different choices of groupinsg n^ exist which result 
in the same rate R < . (In fact any permutation of the values of 

n l , '*' ,n k y ields fc ^ e same rate «) The optimum choice of n^,n 2? ...,n^ 
and k for a given rate will be that set of parameters which yields 
the minimum distortion. For a given K , it is shown in Appendix C that 
a necessary condition on the choice of n i ,n 2’ * * * ,n k to yield the 
minimum distortion is that n^ ^ n 2 < < ... < f° r Variant II 

codes and that n. < ng < n 3 < . . . < n ± where E | X 1 ( > 0 

for Variant I codes. ■" 

The following approach was used in a computer optimization procedure 
to find the best values -of ...,n fe and k for a Gaussian source. 

Several approximations were -used in .this algorithm so the resulting 
parameters may not be truly optimum. However there is reason to expect 
that the performance of the codes obtained from thus algorithm is essen- 
tially that of the very best codes. The procedure is based upon the 
following observations . Define 


p i = n i/ N » 1 = 


( 17 ) 


Then, for large n. and N , the rate R can be written approximately 


as : 


R 


r k 


-X P 1 l08 2 P i 




i=l 

k 


1 " > P i l0g 2 P i 


V *i=l 


Variant I 


( 18 ) 


Variant II 


Furthermore the distortion- (mean-squared error) is given exactly as 
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D = 



(19) 


Treating (18) as an equality we can minimize D with respect to Pq’ ’ * ’ * ^k 
subject to the rate constraint. The optimum p^ are .given as 



j=l ' 

where jS is chosen so that (18) is satisfied . Note that in actuality we 
do not have an analytic solution for the best n^ -for two reasons. 

First, n^ r p^N may not be an integer and second, p_^ is given in terms 
of the ,|_i . which are, in turn, functions of the groupings n. . Fur- 
thermore the above 'solution assumes that k is known while 'we would also 
want .to find. the best k . 

The flow diagram for the computer algorithm used in finding good 
codes is shown in Figure 4, A rough outline of -this algorithm can 
be found in Appendix 6, part D. 

As an example, for N = 400, R -1.5, K odd, the grouping." obtained 
is 


n,=l n 0 =4 n =74 n,=242 n =74 n,=4 n =1 
1 2 3 4 5 6 7 


The resulting rate is R = 1.47514, and distortion is D = 0.18595. 

t 

The .Gaussian order statistics required for Variant I codes were 
taken from -the table of David et al. - ^ The results of this .computer 
optimization for Variant I codes with N = 400 are plotted in .Figure 5. 
(A smooth curve has been- drawn through the resultant R-D points,) Also 
plotted on this graph are 



1) The rate-distortion curve for the Gaussian independent source: 
as given by Equation 16 

2) Several points corresponding to optimum scalar quantizers. The 
quantization regions and representation points. have been optimized 
to yield the smallest mean-squared error for that number of 
representation points. The rate of the uncoded quantizer is 

log Q (number of quantization points). The coded quantizer's 

13 

rate is the entropy of the representation points. See Lloyd 

m 14 

or Max 

3) The performance of a uniform quantizer whose spacing is optimized 
and whose outputs are then Huffman coded, 

4) The performance of some Variant I, N = 400 found by Dunn. 


In conjunction with this figure we see that: 

a) The N = 400, Variant I codes are superior to -Lloyd-Max uncoded 
quantizers for R < 3.7 &hd afe superior to Lloyd-Max coded quantizers 
for R < 3.2. Their performance is approximately equal to that of the 
uniform quantizer (coded and optimized) over the range 1 §5 R < 2.7. 

b) For small rates (R < 1), the performance of the Variant I codes 
approach that of the rate-distortion curve. The highest rate code 
plotted in this figure corresponds to the grouping n^ = 1, ng = N-l. 
This code is a simplex code and its rate and corresponding distortion is 

§ iven aS log N 

T. « /Ol \ 


D 

2 

CT 



where, here. 



( 22 ) 


is .the -expectation of the largest of .N-.- Gaussian 
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random variables of zero mean and unit variance. For very large N , 
16 

Gumbel has shown that 



Combining Equations (21) , (22) and (23) we .have, for large 


N 


^2 ~ 1 - 2R In 2 

or 


(23) 


(24) 


Comparing Equations (24) and (16) we -see that the two agree for small 
values of R. -Thus, the simplex Variant ’I codes are asymptotically 
optimum for large N . Furthermore it -is easily shown that the best 
quantizer which has two representation points for N outputs from a 
Gaussian, independent source behaves as 


= 1 - - R In 2 .(25) 

^ TT 

<3 

Thus, this type of quantizer is not asymptotically optimum, 
c) The codes obtained by Dunn are not quite as good as the .codes found 
by the computer optimization -procedure . In particular, the following 
two. codes are easily compared: 

. Win : n = (5, 5, 35, 40, 65, 100, 65, 40, 35, 5, 5) 

R = 2.86367 
D = 0,03389 ' 

Computer : n = -(1, 2, 7, 20, 46, -77, 94, 77, 46, ‘20, 7, 2, -1) 

R = 2.79184 

D = -0.03362 

The computer generated code achieves essentially the same distortion as 
the .code of Dunn but at a reduced rate. 



The -evaluation of Variant II codes was hampered by the .unavailability 
of tables for the 'expected value of absolute Gaussian order statistics, 
■4w (i5 f . The only tables found were those of -Klatz 17 - which gives 


these statistics only for N < 10 . .It was reasonably simple to ^evaluate 
the -performance .of all groupings for small N . The results for N = 10 
are plotted in Figure 6* 

It is difficult to draw conclusions on the efficacy of Variant'll 
codes from the -present data since we need to evaluate the performance 
of these -codes for large values -of N . A computer program is presently 
•being written to obtain the expected value of absolute Gaussian -order 
statistics for large values of N . 

Two interesting properties of Variant II codes follow. 

1. .For any N , if we .choose only one grouping (i.e., n^ = N), then the 
representation points are located on the vertices of -a hypercube with 


coordinates (+ ■jf- 'tl/I ) , The representation points 

and performance of this code are identical to those -of ai optimum 1 bit 
single -sample -quantizer . 

2. For N = -2 -and = n g = 1 , the eight representation points are 

t 

uniformly spaced on a circle of radi/us 


4 



,sin 


,TT 

8 


Although appealing from -the -standpoint of symmetry, this configuration is 
a relatively poor quantizer with rate 


. R = -1-.5 bits/sample 
and mean-square -distortion 


D 



4 . 2 

— sm 

TT 


rr 

8 


.23 


These values are -plotted as an asterisk on-Figure 6. 
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The method of encoding described earlier for Variant I and Variant II 

codes assumed that the. encoding process consisted of replacing -the output 

of the source by 'its closest codeword. In actuality, for transmission 

over a communications channel (or for storagegin a .memory) one would 

order the codewords and then transmit (or store) the rank order of the 

appropriate codeword. -We now give a method for achieving this.. This 

18 

method is similar to Jelinek's version of the Elias variable length 
noiseless coding scheme. 


The idea of this scheme is to map each of the 
permutations of the vector 


M = n! 



n 


1 2 

M-2> • • • >^2’ 


V- 


k 


k’ 


>^ k 


into a point on the real line in the interval (0,1). These points will 
be equally spaced and the mapping will be one-one. Then, various methods 
can be used to enumerate these points. In the .method described later, 
each point is represented by its binary fraction expansion. 

We now give the -method to map the -sequence 






onto the unit interval. .Define the set of integers I^(i), i = 1,2, ...,k, 
$ = 1,2, ...,N as follows: 


I 1 (i) = ru i = 1,2, . . . ,k 


V 1 ) 



Vi = 1 
h-i f t 


(26) 


K~ 2,3, ... ,N (27) 
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and 


Ijj (0) = 0 


l- 1,2,. ..,N 


(28) 


The mapping of 



onto the unit interval, 



) 


is then given as 


n(U W ) 








I 2 ( ' 1 - ) + N (N-l) (N-2) 


i=0 


i=0 



(i)+. 


N-2 



i= 1 


(N-jO- 



k 

Hv 


u N-l 


(i) + 


i=l 


N! 


(29) 


With this procedure, the sequence 



^k 9 ^k 9 ’ ’ ’ ’ Hie 


the sequence 



• M'k’ ’ 9 * 9 ^k 9 ^k-l s ^“k-l 9 * * ° 9 ^k-l' 9 ° " * 9 ^l’^T 9 * * * 9 ^1 


mapping to the point 1 . 
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(N) 

The binary codeword corresponding to U is derived as follows: 
(ISO 

.Expand ttCH j into an infinite, unique, binary fraction 


tt(U (N) ) = Sl (U (N) )2 _1 + S 2 (U (N) )2" 2 + ... + ^ (U (N) )2"^+ 


where Sj(lr ) e (0,1) are chosen so that 


0 <(tt(U (N) ) s. (U (N) ).2" j \ < 2 _i 


Let Q be the -smallest integer greater than log 2 M . The binary code- 

CN - ) (N) 

word 5(U ') corresponding to U is the sequence 


®(U (N) ) = S 1 (U (N) ) , S 2 (U (N) ),...,S q (U (N) ) (32) 

The codeword i(U^) specifies uniquely since tt(U^) defined 


n(U (N) ) = 2^ S.(U (N) ) 2“ j 


falls in the half open -interval mt(U^) - \ N ’ , , Tr(U^ ) 

\ n^. ... n^. 

An efficient decoding method is as follows. Rewrite (29) as 

h ^2 

"<2 <N> > - h (i) - I W [ x ' l2<i> _ 


N (W-l) X 1 ( V X 2 ( V 1 " M-2 


I 3 (i) - ... 


N ~ 2 T , p %-l 

... - I I -Z ll - 1 \ I.. ,(i) 



In order to recover from tt(U^), the decoder follows the fol- 

lowing recursive procedure: 


j = mm j 


1 

: N 


I^i) > tt(U (N) ) 


(35) 


i=l 


Knowledge of allows the decoder to compute I 2 (i) for ~ .... 

Then 


V 1 


h ~ mir v : n 7 x i (l) + V J i 



1,(0 +^7TTl 1 (ji)> I 2 <i) >£ CU (N) > (36) 


i=l 


i=l 


This continues until the next to last step where 


h’ 1 


3 2 ~1 


j N -l = min V : N 



i=l 


I 1 ( X ) + N(N-1) "W /_ { I 2 (x) + ** 

i=l 



+ N! X 1^7 I 2^2 ) *** I N-2 (j N-2' ) / / I N-1^ 1 ^ - n ^ 

i=l 


(37) 


The final step determines 1 as 
r J N 


h k : i n < ^ " 1 


(38) 


APL-type encoding and decoding algorithms are given in Appendix 6, 
part E. 

The following is a summary of the various steps required in a quan- 
tization scheme based upon permutation codes. For convenience, only 


Variant I codes are discussed. 



1. The outputs of a source are subdivided into blocks of N symbols. 

2. The positions of the n^ largest samples, Ug next largest 
samples, ... ,n^ smallest samples are noted. 

3. This position vector is coded into binary digits by Equations 
(29), (30) and (32). 

4. The binary digits are decoded into a position vector by the 
method described by Equations (35) -(38). 

5. The representation vector is then obtained by placing in the 

largest n^ positions, the real number p,^ , in the next largest ng 
positions, the real number (j,^ > etc., and in the smallest n^ posi- 
tions, the real number . If available, .the values to be used for 

p,^ are those given by Equation (11) . Alternatively, the encoder 
could Ic'ol’lect* the sample order statistics and transmit these numbers 
to the receiver at the end of the transmission. The receiver would 
then use these sample statistics as if they were the actual order 


statistics . 
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Captions for Figures of Part -III 


Figure -1 . An 'Example of a tre.e code . 

Figure 2 . Best average .distortion achievable -of near-optimal convolutional 
codes of constraint length 5, 7, 10, and .14. 

Figure -3 . Average -distortion vs. number of -encoding steps -of ’the 
Viterbi encoding algorithm when used with the codes of 
Figure .1. Also plotted is the average distortion vs. 'average 
number of encoding steps of the Stack Algorithm when used with 
the code of constraint length 14 whose ultimate performance 
is given in Figure 2. 

Figure 4 . Flow-chart for determining optimal groupings £ 1 ^,,..^^ 
for permutation- codes. 

Figure ,5 .- Comparison of Variant -I-type code performance with that . achievable 
by quantizers and with the rate-distortion function for Gaussian 
-sources . 

Figure 6 . Short Variant II-type code performance .compared to that achiev- 
able by quantizers and to the rate-distortion function for 


Gaussian sources. 
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Flow chart for determining 'optimal groupings 
for permutation codes. 
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Figure 6; Short Variant Il-type pods performance compared to that 
achievable by quantizers and to the rate-distortion 
function for Gaussian sources. 
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APPENDIX 1 


DETAILS OF STACK AND MAP MAINTENANCE AND PURGING 
MANAGEMENT OF THE MAP OF VISITED NODES AND ITS 'PURGING 

The following purging strategy will be based on the requirement 
that if there is any entry of depth I in the stack, the decoder has 
made a final decision about information digits of depth at least I-t, 
where t is some conveniently large integer exceeding the constraint 
length v of the code (good rule of thumb seems to be t ~ 3 v) . 

All variable names used in this description are those used .in the 
FORTRAN stack decoding program. The operations outlined below are 
in addition to those already in that program. 

1. Before a node is extended the decoder checks whether 

0 < I -Ml • (NPOINT(NI) < t . If not, the decoder sets NPOINT(Nl) = -1 
(the location of the root node is Ml(l) = -LIMAS K) . Also, should 

r 

1 < IMAX-t, the node is dropped from the stack and no extension is made, 

2.. At the beginning of the decoding process, -we set -Ml(J) = -t, 

J = 2, 3 , , , . , IMAP, where IMAP is the number of locations in the map. 
There will be a pointer LOCPUR whose initial value iC 2 . When a 
new map entry is to be made corresponding to depth I, the decoder checks 
whether 

I - Ml (LOCPUR) > t + 1 

If SO; the -entry -is made .into location -LOCPUR. If- .not, then we increment 
LOCPUR by 1 and try again (when LOCPUR = TMAP, instead of incrementing, 
LOCPUR is se.t: equal to 2). If the search has been unsuccessful for 
IMAP-1 tries, a map overflow is declared. One may stop at that point 
or take a risk and rep.lace that entry whose M1(J) value is smallest. 


LOCPUR would then be set to J . 



3 . Decision Making 

When a node is to be extended such that I = IMAX, we set IMAX •= 

IMAX + 1 and make a decision on the node at depth -IMAX-t = I+l-t. 

This is .done "as follows: 

Set Mil = I, NPOl = NFOINT(Nl) 

CASE I : .MI (NPOl) « I+l-t 

In this case the decision is a 1. 

CASE II ; NPOl =1 or ML(NP01) > Mil. In this case the decision is a 0. 
CASE III : Neither of the above. In this case set Mil = Ml (NPOl) 
and NPOl = MP(NPOl) and repeat above. 

Argument why strategy works : 

Because of 1, when the 'entry at location NPOl was made then either 
MP (NPOl) = 1 or Ml (NPOl) - Ml (MP (NPOl)) < t . In the .first case, 
the value MP(NPOl) =1 is -either natural, or results from application 
of rule 1. In either case, at depths Ml (NPOl) -1, Ml (NPOl) -2, . . ,,Ml(NFOl)-t, 
the -path has 0 branches only. Suppose the latter case is true. Then 
the entry at MP(NPOl) may be replaced only (see 2) by an entry whose 
value Ml' satisfies Ml' > Ml (MP (NPOl)) + t+1, i ,e . -such that Ml' > 

Ml (NPOl), The new "unnatural" entry will then be recognized by the 
decision procedure (as the .'.instance Ml (NPOl) > Mil of CASE II). 

Note from 2 and 3 that the replacement takes place when the decision 
about depth Ml (MP (NPOl)) has already been raken. Thus when the stop 
of CASE II occurs, a decision is .to be taken about a branch inside the 
depth interval (Ml (MP (NPOl) ) , Ml (NPOl)'), and all such branches are 
0‘s by definition. 

Finally, if the stop of CASE I occurs, -Ml (NPOl) is the original entry 
at NPOl, and does correspond to a 1-branch. 

The fact that either CASE I or II will eventually occur -need not 


be labored . 
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Purging of Map when the depth of definitely decoded digits is not 
given by formula J-t where t is a fixed constant and J is the depth 
of furthest advance in the tree . 

Note r A good decision method may be: whenever the likelihood on 

some path s J = s , ...,s exceeds a maximal threshold T , all digits 
S l’*** ,S k ^ < J ) are decided such that the likelihoods 
L(s ) < T-a, for i = 1,2, where the value 'Of a is chosen in 

r~> 

some convenient manner. 

Assume that in accordance with some decision strategy, all message 
digits at levels 0,1,2, ...,t are definitely decoded for some t > 0 . 

We will create 2 new arrays: M2[ IMAP] and NPTMlfKSTACK] (in addition 

to those arrays that are utilized in -the original Fortran Stack Decoding 
'Program). Their values will be 

a) Initially: Ml(l) = -LlMASK, NPOINT (1) = 1, NPTMl(l) = 

-LlMASK, Ml (J) = -LlMASK for J = 1,...,IMAP. 

b) Suppose a node at location N1 of depth 11 is being extended, 
the 1-exfension goes into stack location N2, and the newly created 
map location will be J . Then we leave NPOINT (Nl) and NPTMl(Nl) 

as before. We set Ml(J) = NPTMl[N2] = I 1+1, MP(J) = NPOINT (Nl), 

M2(J) = NPTMl[Nl] , NPOINT (N2) = J . As a result of the above -strategy, 
as long as no map location is purged, we will always have 

M2[J] = Ml (MP ( J) ) (1) 

and 

NPTMl(K) = Ml (NPOINT (K)) (2) 


The relations (1) and (2) will then provide a check on pointer validity: 



MAP PURGING: 


When levels 0,1,..., t have been definitely decoded, no node of 
depth I < t will ever be extended, and all map .locations J such that 
Ml(J) k will be available for re -assignment . This can be done by 
a pointer LOCPUR that is initially set to 1 . LOCPUR is incremented by 
1 until it has a value such that Ml(IDCPUR) < t . In that case the 
new map entry will go into location IDCPUR. 

DECODING DECISIONS 

Suppose .decisions at levels 0,1,..., t have been made and a decision 
at levels t+1, ...,t+j is to be made next (j > 1 for instance when 
decisions are likelihood-oriented) with node at location Nl determining 
the choice. Set NP01 = NPOINT(Nl), NPTM = NPTMl(Nl) (we assume that 
T > t+j) 

1. If NPTM > t go to 2. Otherwise stop. 

The digits at levels t+1,..., t+j are those revealed by the map so 

far (i.e. those found by the usual following of back pointers .in the map). 

2. If Ml(NPOl) = NPTM go to 3. Otherwise stop. 

Then for the purposes of the -decision all digits at levels lower than 
NPTM are zeros. The digit at level NPTM is a 1 and digits at levels 
higher than NPTM are those revealed by the map so far. 

3. Set NPTM = M2(NP01), NP01 = MP(NPOl) and go to -l;. The 
map digits revealed so far are valid. 

Note ; This procedure is successful because any new entry has a value 
of Ml that exceeds .the old value of Ml. Therefore if an old pointer 


NPOl is involved, we will surely get Ml(NPOl) > NPTM. 
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STACK MAINTENANCE AND PURGING STRATEGY WHEN PATHS ARE SPECIFIED BY 
ACTUAL SEQUENCES OF INFORMATION DIGITS OR PARITY DIGITS 
Stack : . ' Has locations called Ml of v-l+k digits, the rightmost being 
the most recent, some of the leftmost possibly dummies. It has 
an I counter indicating the depth of the path and a pointer PI 
to the preening ; location in the map. In the forthcoming "discussion 
it is assumed that Ml contains an info, sequence. If parity sequences 
are involved, only step 11 used heed- 'he .changed' in accordance with 
Note I. below. 

Map : Has locations called m 2 of k digits (no dummies here), pointer 

MPP indicating the preceding M2~location in the map, and pointers 
MPL indicating the next M2 location of the same depth. 

Table : Entries A(l) , . . . , A(j) indicate the values of the various 

"live" depths that exist in the map. Entries B(l) , . . . ,B(j) , 

C(l) , . . . ,C(j) are pointers. B(i) points to the first 
location and C(i) to the last location in the map of depth 

"X 

A(i), i = 1, B(i) is chained to C(i) by means of the 

£ 

pointers MPL. In fact, C(i) = MPL (B(i)) where t is such that 
MPL' (B(i)) = 0 and MPL(B(i))^0 for r=0,l,...,t. 

B(j-KL) points to the first available location of the map. 

INITIALIZATION 

. A(l) = -(j-1) j . . . ,A(j) = 0 ; B(l) « ... = B(J) = 0 
B ( j+1) = 1, MPL(i) = i+1 i = 1,2,..., l|M2ll -1 

MPL( 11 M2 U ) =0 . 


Rest is initialized to 0. 
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OPERATION 

1. Upon obtaining for extension some stack location j3, the decoder 
checks whether I(£) = (V-l) + $k for -some .$ ='1,2, ... 

If I(/3) f (v-1) + ^.k go to 11, else .continue, 

2. If I(/3) < I MX go to 6, else . 

3. If I < (v-l) + -(j+l)k go to 5^ else continue. 

max 

4. Go through’ the chained list El (/}) ~ MPP(El(j3)) ~ ... ~ MPP^ 
(Pl(/3)) and release the digits in location M2(MPP^ ^ (Pl(j3))) to the 
user . 

5. Find the value t* such that A(t*) = $ -j and .make available 

to the map those -locations that are -linked to B(t*). This is done by 
setting MPL(C(t*)) < — B(j+1) and B ( j+1) B(t*) . 

If B(j+1) = 0, map overflow takes place and stop. Otherwise, 
set A(t*) , B(t*)«- C(t*) <6— B(j+1), B(j+l)4r- MPL(B(t*)) , 

MPL(B (t *) )i 0 , -MPP(B(t*)) <-Pl(/3) , and Pl(j3) 4r~ B(t*) . Copy 

the last k digits of Ml(/3) into M2(B(t*)). Go to _11 . 

6. If I - -I(/3) < V-l+jk go to 8^ , else continue. 

IH3X 

t 

7. In this case all of the decisions about digits contained in 
Ml(/3) have already been made and this entry should therefore be -purged 
from the stack. Go to .12. 

B. If there.'is no t such that A(t) go to 14, else continue. 

9. Find t such that A(t ) = ^ . See if there exists a loca- 

+ + + 
tion ot -linked to B(t ) such that Ml (o' ) = Bl(/1) (ffhis requirement 

4 - 

is ignored if = min A(t) and the -contents of -M2(o ') are identical 

t 

+ + 

to the last k digits of Ml(/3) . If o’ exists, set Pl(jS) < — O 

*4* 

and go to 11, else -continue (Note: if B(t ) = 0 then no a exists 

by definition) . 



10. If B(j+1) = 0 , map overflow takes place and stop. Otherwise 

set B + < — B(t + ) s B(t + ) B(j+-1) , B(j+1)4 MPL(B(t + )} MPL(B (t + ) <4— B + 

-f 

(this puts old B(j+1) to the top of the t set, and establishes a 

+ + + 

new top for the set j+1) . If B = 0 then set C(t ) ^ — B(t ) . 

Also, set MPP(B(t + ))<; — Pl(/3) and Pl(/3) 4 — B(t + ) . Finally, copy 
the last k digits of Ml(/3) into M2(B(t"*")). 

11. The rightmost (v-1) digits of Ml (/3) are used to find the 

likelihoods and of the two extensions of the path on top of 

the stack. If the 0-extension stays at stack location ft and the 

1 -extension goes into location ;then !(/?)<£ 1 (jS^) I (j3) + 1, 

Pl(M^)^ — P1Q0) , Ml(jS) is shifted left by 1 stage, a 0 being 
entered into the rightmost stage, Ml(M) is copied into Ml(jS^) and 
a 1 is entered into the rightmost stage of the latter, CUM(j3^) 4 — CUM(j3) 4- X 
and CUM(/?)< : — CUM(/3) 4- X q . Appropriate pointers to locations fi and 
are set in the auxiliary stack as usual. 

12. Find the top of the stack. 

13. Go to 1. 

14. This is the case then I max " (j**l)k >I(/3) > I lx " [(v-l)+jkj 

so there exist, some digits in Ml(/3) that have not been decided yet, 

but the pointer El(/3) does not point to any valid entry, and furthermore 

min A(t) > ^ . Go to 11. 
t 

NOTE I. 

If stack is not to contain message digits but rather the digits of 
the parity vector, step 11_ of the procedure must be modified. In this 
case, what is to be saved are the parity digits. 



We have two parity sequences P^(D) and PgO 0 ) ( see ' of 

II-A-1) that must be saved. Furthermore, 



s 

n 


+ p 


n 

i i 0 


p" +1 

i 


(D) 



P”(D) + P 


n ’ 
i» 0 


G“(B) 


i 


1,2 


We assume as previously, that q “ g^g ~ s l,?\ ) -l “ S l,v-1 = 1 
G (D) and G 0 (D) being of degrees X-l and y-1, respectively. 
Therefore, Ml(j3) must contain' k + (A-l) + (v~l) stages. One 
possibility is that its contents are given by 


n 


3 n-k’ ’ ’ ' ! S n-1 ! P 1 , O’ 


n n ,n 

tPl ) X-2 ,p 2,0’ ‘ ’ p 2, y-2] 


The other possibility is to save y-1 positions in the map by taking 

advantage of the fact that pj^(D) and p”“J, . . . ,P^ J determine 

s ... s uniquely by use of the circuit of Fig. lb of Part II. 
n-1’ ’0 

In this case Ml(/3) would contain 


n-k 


n-1 n 

i Pi n> P 


1,0’ •••’ P 1 J 0’ F 1,0’ P 1 ) 1’ '" lF l,\-2 ,tJ 2,0 J •••’ P 2,v-2J 


We will denote The coefficient vector of P n (D) by ’p 

i 

Therefore we get the following two possible alternatives to step 11, 


r 1 2 -, , 1 

11a . (Ml (/?) contains [s,p ,p J, map contains s) . s , p 


and 




n 


p 4, are shifted separately to the left, the leftmost digits p^ ^ and 

P 2 0 bein § used to compute the likelihoods X^ and X^ of the two^ 

extensions of this path. After the shift the leftmost digits are dumped 

1,2 

and a 0 is supplied to all three rightmost positions of s, p and p . 



If the 1-extension goes into location then I(V?)4 — I(/3^)^ — I(/3)+l, 

Pl(jEL).£_i-Pl(j3) and Ml (j3- ) <: — Ml<j3) + [0,...,1, g 1 ^ 2 ] where 

L m 1 ^ A_» -j 

J = 8 1,1 S '* ,,S 1,X-1 and ,§ = g 2, l 5 ’ * ' I§ 2 ; v-1' Finally ’ 

CUM(^),^ — s.CUM(/3) + and CUM(/3)^ CUM(/3) + X q . Appropriate pointers 

to locations /? and ( 3 -^ are set in the auxiliary stack as usual. 

C l 2 "i 

p*,p , p J, map contains p* , where 

V l "' J Ay 

p* = p!. 1 ...,p" ^ . (p*,? 1 ) and (p 2 ) are shifted separately left, 

the digits p 1 ? n and p^ being used to compute the likelihoods A 

1 j U « j U O 

and X^ after two extensions of this path. After the shift the leftmost 
digits of (p*,p ) and (p -1 ) are dumped and a 0 is supplied to both 

''N/ fs*» 

rightmost positions. If the 1-extension goes into location then 

I(j3)«— K^)^— I^+l.PC/S^^— P(j8), and Ml^) <— Ml(jS) + [o, O^g 1 , g 2 

Finally, GUM(/3 ;L )< — CUM(/3)+A and CUM(jS)<f — CUM(/3)+A o . Appropriate 

pointers to locations /3 and are set in the auxiliary stack as 


usual . 



APPENDIX 2 


TABLE LOOK-UP FOR MULTIBRANCH ADVANCE 


I„ Binary Symmetric Channel-Systematic Code 


Suppose we wish to advance message bits at a time, and let 
us -assume a rate 1/2, systematic codes. 

We will describe how the move forward is carried out at some time 
i at which the parity state polynomial is 

P i (D) = P x (i) + P 2 (i)D + ... + P v _ 1 ( i ) dV " 2 C 1 ) 

where v is the constraint length of the code. We will assume that the 
generator polynomial is 


G(D) = 1 + gjD + ... + g v-1 D 


V-l 


( 2 ) 


Suppose the next “digit information polynomial is 
S i (D) = S i-HL + S i+2 D + + s i+j[' D 


( 3 ) 


Then the .first-position transmitted polynomial is 


X X (D) = S.(D) (4) 

and the second-position transmitted polynomial is 

X 2 (D) - [P t (D) + S i (D) G(D)]^ (S) 

where [ denotes truncation of ^ th and higher powers. 

Next, suppose first and second position polynomials Y^(D) Yg(D) 
are received, respectively, and it is desired to .find the -most 

likely sequence s j_+i J ” * • » s i+$ that could have caused it (k=l, 2, . . . 2^ ) 



If the channel is BSC, then the answer depends strictly on the weight 
of the difference polynomials 

Z X (D) = X 1 (D) + Y^(D) 

Z 2 (D) = X 2 (D) + Y 2 (D) . (6) 

But, note from (5) and (4) that 

Z 2 (D) - Y 2 (D) + [P.(D) +X x (D) 6(D) 

= Y 2 (D) + [P i (D) + Z l (D) G(D) + Y 1 (D) 6(D)] 5 

= [[^(D) + Y X (D) 6(D)] 5 + Y 2 (D)j + [Z^D) G (D ) ] ^ 

= B (D) + [Z 1 (D) G(D) (7) 

where B(D) is independent of Z^(D) , 

It thus follows from (7) that we can arrange tables that will be 
useful in evaluating likelihoods and identities of most likely 

branches leaving a node. 

The first .table, called LTABM, lists for each of 2^ possible 

0 

different values of B(D) the weight -ordered sequence of the 2 A 
different outgoing branches Zj,(D) (the weights are simply wt(Z^(D)) 
+ wt(Z 2 (D)) both of which depend on Z^(D) and B(D) only). 

The second table, called LTABW, lists for each B(D) the weights 
corresponding to the outgoing ^ranches of LTABM. 

A third table, LIK, gives the correspondence between weights and 
likelihood values. 

Finally, it might be useful to have a fourth table, called CODEY 
that would supply the correspondence between Y-^(D) and [Y-^(D) G(D)] 



Suppose the k most likely outgoing branch was wanted, -one would 
proceed as follows: 

1) Look-up [Y^(D) G(D) ] in -CODE Y and form 

B(D) = [P.(D)] X + [Y 1 (D) G(D) ]* + .YjjCD) 

2) Find the k^ entry Z-^(D) and the -B(D)-row of -LTABM and 
form the corresponding message sequence 

S ± (D) = Z 1 (D)+Y 1 (D) 

3) Form the next parity state polynomial P i+ ^( D ) recursively 
as follows 

P i+1 (D) = (d^cp.cd) +s 1+1 G(B>]} * 

W D> - +s i+Ji G(D)J }* 

where * denotes the dropping of the D ^ term. Before generation 
of IL + jCD)s the coefficient p^(i+j-l) is stored in the map, 

4) Find the weight w^, of the k^ entry in the B(D)-row of 
LTABW and look up. in LIK the likelihood of the corresponding branch. 

The latter is then used in forming the cumulative likelihood of the new 
path . 

NOTE : The value of B(D) should really be computed only when the node 

is extended for the first time, i.e. along the most likely branch. Then 

'th 9 

it should be stored for later use .if the ,k (k=2, 3, , . , , 2 A ) most 
likely branch is needed, 

NOTE : Obviously a straight-forward modification of this .method -will 


apply to any rate code. Regardless of the rate, the LTABM and- LTABW 



tables will have 2 n entries, where n is the' number of received bits 
that correspond to a path extension (In the preceding- discussion, n - 2^. 


11 . BINARY INPUT -M-ary OUTPUT SYMMETRICAL CHANNEL-SYSTEMATIC CODE 

We will consider the situation where for simplicity the received 
symbols can be written as pairs (Y,V), where Y is binary and V has 


alphabet of size m = M/2 . 

Furthermore, the channel structure is such 

that 



w(0, V/0) 

= w(i,vyi) 


w(o,V/l) 

= w (l» V/0) 

(8) 

for all V e 

[o,l,...,M-l^ 

, In this formulation the Ames channel has 

M - 4 , Note 

from (8) that 

the likelihood 


log 


w(Y, V/X) 
w(Y,V) 


R 


f (Y $ X, V) 

108 f 2 (v) 


- R 


(9) 


is a function of the pair (Y © X,V) only. Assuming that 


min 

V 


^(O.V) 

"W 


> 


max 

V 


f lOUV) 

^r 


( 10 ) 


(which is true on the Ames Channel) y the following strategy is very 
reasonable , 

0 

Create a table LTAB whichlfor each of the 2 A possible different 
values of B(&) (they are based on the Y-components of the received 
symbols only!) lists . the weight-ordered sequence of 2 different 
outgoing branches (Z^(D) (note that it will be handy to list 
Zg(D) also). Ties in weights are resolved in -some arbitrary manner. 
Create a table LIK giving the correspondence between the pairs 


(Z,V) = (Y © X, V) and the likelihood values 
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f x (Z,V) 

108 f 2 (V) " R 

Finally, construct the .table CODEY that will have the .correspondence 
between -Y^(D) and [Y^(D) G(D)]* . 

The path extension -procedure below will not be able .to pick 
every time the -k th .most likely outgoing branch, because, e.g., in 
case of ties although the distance between a received sequence 

y 1 ’ y 2’ "' ,y n 

1 1 

and two possible branch sequences x^, ...,x and x^,...,x n might be 
the same, the distances between the latter and the actual symbol sequence 

( y l’ v l> ’ ^V’-’^n’V 

may turn out to be very different. However, it is believed that most 

of the time the errors in ordering will not damage the algorithm's 

t 

performance too much. Furthermore, experiments will no doubt bear 
out the simplicity and speed advantage of the suggested extension 
procedure : 

$ 

1) Look-up [Y^D) G(D)r in CODEY and form 

B(D) = [P i (D)]^ + [Y X (D) G(D)] S + -Y 2 (D) 

2) Find the k th entry [Z^CD), Z 2 (D)] in the B (D) row of 

'j 

LTABM and form the corresponding message sequence 
S.(D) = Z l (D) + Y X (D) 

3) Form recursively the next parity state polynomial (D) : 

*1+1 1 V> = '{d- 1 [P.(D> + » 1+ lG<P»}* 


W” “ ( D_1 +S i+( G(D »1* 



4) Using the results of (2), look-up in LIK the likelihoods 
log (f^(z.,v )_ /f 2 ( v .) “ ant ^ f° rin the likelihood increment X corre- 
sponding to the branch S.(D) : 
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X = 


1=1 


r f i (z i* v i ) 1 

L 108 -^^ ' R J 


III. 


BINARY SYMMETRIC CHANNEL - NON -SYSTEMATIC CODES 


We will conclude by treating non -systematic codes of rate 1/2 
for the BSC. The treatment of such codes for the symmetric channels 
of Sec. II. is similar and is left as an exercise. 

Let the two generator polynomials be G. (D) , and G 9 (D), and denote 

JL 4 

1 2 

teh two parity state polynomials by P^D), P^D) . 

GASE I : One of G..(D) , G 9 (D)., say G. (D) is such that 


’ 1,1 


l v 

g l,2 


g i, x-i 0 


(li) 


In this case the first position transmitted polynomial is 
X 1 (D) = +S.(D) 


( 12 ) 


(it is assumed that ^ — 1), and the second position polynomial is 

X 2 (D) = [pJ(D)]^ + [S i (D) G 2 (D)] S (13) 

Therefore, the difference polynomials Z (D) and Z 9 (D) are 

ZjW = Y X (D) + [P*(D)]^ + S (D) 

Z 2 (D) - Y 2 (D) + [pJ(D)]^ + [S.(D) G 2 (D)]^ = 


( 14 ) 



where 


B(D) = Y 2 (D) + [P^(D)]^ + [(Y 1 (D) + [pJ(D)3^ ) GgCD)]^ .(16) 


is not a function of the branch being extended. 

CASE II : Neither G^D) nor G 2 (D) has leading coefficients 

that satisfy (11) . In this case 

Z^D) = [pJ(D)]' 5 + S.(D)[G 1 (D)] 5 + D^F(D) + Y^D) (1,7) 

where F(D) is identical with the polynomial consisting of higher 
.than (^-1) degree terms of S . (D) [G 1 (D) $ „ Also, there exists a poly- 


nomial H(D) of degree at most ^ -1 such that 
[ Gl (B)]^ H(D) = -1 + E(D) 


(18) 


where -E(D) is some polynomial of degree at most -2 . Post-multiplying 
both sides of (17) by H(D) we get 

(Z X (D) +‘[pJ(D)]^ + Y 1 (D))H(D) = S.(D)-hAs(») S. (D)-hA (D)H(D) (19) 


Since 


[D^F (D)H(D) 3^ = 0 and [D^E (D)S i (D) = 0 
we get that 

S.(D) = [(Z^D) + [pJ(D)] 5 + Y^D)) H(D)] S 


(20) ‘ 


Therefore, 

z 2 (D) 


Y 2 (D) + [p|(D)]^ + [S i (D)G 2 (D)]^ = 

y 2 (d) + [pJ(D)]^+ [([pJ(D)]^ + y 1 (d)h(d)g 2 (d)3^ 

3 ? 


-H- [ Z 1 (D)H(D)G 2 (D)3 



IHTEin?IO]?ALLT LEFT BLANK 
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Hence 

Z 2 (D) = B(D) + [Z X (D) H(D) Gg (D ) 3 * (21) 

where 

B(D)=Y 2 (D)+[pJ(D)]^ + [([pJ(D)]^ H-Y^H^G^D)]^ (22) 

is not a function of the branch being extended. 

It is clear that for the non-systematic codes, relations (14)„, 

(15), and (16) (CASE I) or (20, (21), and (22) (CASE II) will be the 

basis for our table construction aid for our path extension strategy. 

We suggest the formation of the following tables: 

0 

I. LTABM, listing for each of the different values of 

0 

B(D) the weight -ordered sequence of the 2 different outgoing branches 
Z^(D) (formula (15) is used for CASE I, and (21) for CASE II). 

II. Table LTABW, listing for each B(D) the weights corresponding 
to the outgoing branches of LTABM. 

III. Table -LIK listing the correspondence between weights and 
likelihoods . 

IV. Code I listing the correspondence between • Y(D.) and . 
[Y(D).G 2 (D)J^ for CASE I and between Y$)'and [ Y (D)H(D)G 2 (D) for CASE III. 

V. CODE II listing for correspondence between Y(D) and 
[Y (D)H(D) ] \ for CASE II. 

fch 

We will now describe the method of finding the k most likely 
outgoing branch for CASE II. The treatment of CASE I is similar. 

1) Look up W 1 (D) = ( [ (p} (D) )^ + Y (D)] H(D) G 2 (D)J in CODE I 
and form 

= Y 2 (D) + [P*(D)J^ + W X (D) 


B(D) 



2) Find the k*"* 1 entry Z^(D) in the B(D)-row of LTABM, and form 

1 Si 

W 2 (D) - Z 1 (D) + Y X (D) +tP i (D)] 3 ^ 

5 

3) Look up S. (D) = [W Q (D) H(D)] in CODE II and form recursively 

1 2 

the parity state polynomials P^ + ^ (D)' and P^^(D) . Store the coef- 
ficients p^(i+j+l) in the map. 

th 

4) Find the weight w^ of the k entry in the B(D)-row of 
LTABW and look-up in LIK the likelihood of the extended branch. 

NOTE ; Extension of paths in non-systematic codes is clearly more 


cumbersome than that for systematic codes. It is therefore the latter 
that should be used wherever possible. 



APPENDIX 3 


DESCRIPTION OF THE RUDIMENTARY AND PULL-UP DECODING ALGORITHMS 

It has been shown in Jelinek and Cocke'*' that boot-strap hybrid 
decoding is applicable to all channels symmetrical from the input that 
have input alphabets in a finite galois field. It is easiest to describe 
the method first as it applies to binary symmetric channels (BSC). The 
generalization to symmetrical channels with binary inputs and arbitrary 
output alphabets is described in section II-B-2. 

As usual, we will encode blocks of T binary information symbols 
into codewords of length (F + t)/R where R is the sequential coding 
rate and t is the length of the dummy information sequence (known to the 
decoder) that is used to make the sequential decoding of the last informa- 
tion symbols reliable. Let us encode m-1 blocks of information using 
the same convolutional code. We will refer to the resulting codewords 
as information streams. Let us arrange these streams underneath each 
other, obtaining the solid line array of Figure 1 . Let us then generate 
the m th parity check stream (interrupted line in Figure 1) whose 
i th digit will be the parity of the i th digits of the m-1 information 
streams. Stated in another way, the parity stream is a modulo 2 position- 
by-position sum of the information streams. Because of the linearity of 
convolutior .1 encoding, the parity check stream corresponds to a path in 
the coding tree whose information digits are the mod 2 sums of the informa 
tion digits underlying the^information streams. Hence, all m of the 
streams are in principle sequentially decodable. Moreover, if any subset 
of m-1 of these streams is correctly decoded, the remaining m^* 1 
stream can be determined by use of the parity relationship (in fact, 
Falconer's [Ref. (4), Part II] strategy is based solely on this obser- 
vation) . We now describe the rudimentary bootstrap hybrid decoding 



scheme. Suppose that the m -streams are sent through the binary 

symmetric channel, and that the corresponding received digits are 

again arranged by the decoder into an m by (^ + t) /R array '(see the 

t tl 

solid lines of Figure 2). If the j received stream is to be decoded, 

the received digits of all other streams should also be taken into account, 

th 

since these contain information about the transmitted digits of the j 
stream (the transmitted digits are rqlated by the parity constraint). 
However, it is easy to show that all the pertinent information of the 


i th received digits y. (1) , y . (2), . . . ,y. (m) about the i"“ transmitted 

i l r 


. th 


.th 


digit x. (j) in the j stream is contained in the pair y. (j), 

z. = y. (1) @ y. (2) © ... © y.(m). Therefore, let the decoder 

i x i x 

generate a (m+l) th channel state stream (see interrupted line of 
Figure 2) whose i^ digit will be the parity of the i^ digits of 
the m received streams . Before specifying exactly how the state 
stream is to be used in the decoding, let us note that if it has -a 1 
in its position, an odd number of received streams have an error 

in the position, and if the state stream has a 0 in the j*"* 1 

-position, an even number of received streams have an error there. 

.Let q (0) [ q, (1) ] denote the probability "that of k digits 
independently transmitted through a binary symmetric channel, an even 
[odd] number was incorrectly received. By a well known formula 
(see Gallager (1963), p. 40), 


%<o> ■ ^ 


k 


< t a> - ^ 


( 1 ) 


where p is the crossover probability of the binary symmetric channel. 

Let denote the i th state stream digit, and let y^(j) and 

x^(j) denote the i*"* 1 received and transmitted digits of the 


received and transmitted digits of the j 



stream -we can view 


t Vi 

stream. For the purpose of decoding of the j 
the transmission process as having taken place over an augmented channel 
with inputs and outputs the pairs (y^Cj),^). This channel is 

governed by the transmission probability matrix w m '(y,z/x) that , is spec- 
ified by 

w m (0 s 0/0) = w m (l,0/l) = (1-p) q m _ 1 (0) 


w (0, 1/0) = w (1,1/1) = (l-.p) q -.(1) 

m m m-l 


w^(l s 0/0) = (0,0/1) = pq^xd) 


m 


m 


w m (l,l/0) = w m (0,1/1) = p q m _ 1 (0) 


(2) 


.th 


When sequentially decoding the j stream, the receiver should ’use 
in the usual way (Jelinek [1968] Sec. 105) the likelihood function 


V(i) = log 

m 


wCy.-Cj)^ z./x. (j)) 


m i 


i i 


W m (y i (j )> 2 i ) 


- R 


( 3 ) 


w 


,(y> z ) = I [w m (y,z/0) + w m (y, z /l); = \ q^z) 


(4) 


We are now ready to describe precisely the rudimentary bootstrap 
hybrid decoding algorithm 1 Let a step in the decoding process consist 
of a change of the decoder's node location in the coding tree. .Let M 
be some convenient positive integer. Let the decoder start out by de- 
coding the first stream (using the likelihood function (3) with j =1) . 

If it does not -complete the decoding job within M steps, it stores 
the parameters necessary for resumption of decoding at the node at which 
it was last located, and starts decoding the second stream from- its 
origin) „ Again, if within M steps it does not successsf ully decode 
the second stream, it stores the necessary parameters and switches its 
attentions to the the third stream, etc. If it turns out that the 



decoding was not completed on any of the m received streams within the 
allotted M steps, the decoder returns to the first stream and resumes 
its decoding from the point at which it left off (the parameters stored 
previously for -this purpose will enable it to do so). Again in this 
second round a maximum of M additional steps is alloted to each stream 
and if this does not -suffice a next round is started beginning with the 
first stream, etc. After continuing in this manner the .decoder will 

fch 

finally succeed in decoding one stream, say the . This means 

that the decoder has found a path in the coding tree corresponding to 
message digits whose symbols it believes to have been those of the 
j ^ transmitted stream; . The receiver will then replace the 
received stream in the array of Figure 2 by the estimated ^ trans- 
mitted stream and will recompute the symbols of .the channel state stream. 
Assuming the decoding to be errorless, a 1 in the i^ position of the 

new state stream will indicate that an odd number of the m-1 undecoded 

th. 

streams has an error in the i position, and a 0 will indicate that 

an even number of transmission errors occurred. To decode • any of the 

remaining m-1 received -streams the decoder will take advantage of the 

newly computed channel state stream. Thus it will use the likelihood 

function ^ based on the probabilities -^(y,z/x) that are defined 

by (2) if m is replaced everywhere by m-1 . -Decoding will start from 

the beginning of the first stream (assuming that f 1 ) and continue 

til 

in a round robin fashion (with the stream excluded), each stream 

being allocated M steps per try, until an additional stream is decoded, 

say the . As before, the received stream is replaced by 

til 

the estimated transmitted stream and the channel state stream is 

accordingly recomputed. The decoding of the m-2 remaining received 
streams then starts from the beginning node of the first undecoded stream 
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again, the likelihood X m 2 use d being based on the probabilities 

w Q (y>z/x) defined in (2). The pattern is now clear, it only remains 

to note that when m-1 streams have been decoded,- the remaining stream 

is determined from the parity constraint by taking mod 2 sums of the 

corresponding digits of the m-1 decoded streams. 

Our method is seen to be a bootstrapping operation, with each 

additional decoded stream being helpful in the decoding of the re- 

-maining streams. Just how helpful the state -stream is can be seen from 

the extreme use when all but two streams have been decoded; Then, when 

fchi 

2 ^ = 0 the error probability in the i position on either of the 
2 2 2 

streams is p /[p + (l-p) 4 ”], and when 2 = 1 , the error probability 

is 1/2 [the original crossover probability of the BSC is assumed' to 

be p] . We therefore place great reliance on the correctness of those 

received digits corresponding to a 0 in the state stream, and no reliance 

on those corresponding to a 1 . This speeds up decoding immensely. 

We describe next the pull-up decoding algorithm as it applies to a 

Fano sequential decoder. The modifications necessary for stack decoding 

3 

are easy and can 'be found in Jelinek and Cocke. The pull-up scheme will 
do away with the excessively frequent (one every H steps) changes in the 
identity of the stream being decoded which involve a large overhead 
cost. In fact, there is no need to discontinue work on one stream as 
long as the decoder has not run into computational trouble such as 
takes place when the value of the running threshold. Tq drops by a 
predetermined amount U below the maximal value ever achieved. 

We will say that a U-drop takes place at a node of depth i whose 


cumulative likelihood value is greater than or equal to 


and whose immediate predecessor has likelihood value less than or equal 


to 


T 

MAX 


- U , where t is the threshold increment of the Fano .Algorithm. 



The following suggested procedure will apply directly to the BSC, 
but its generalization to the various categories of channels symmetrical 
from the input are obvious. To describe the scheme simply, we will need 


lAO 


to equip the channel' state stream with an additional component k^, i = 1 , 

2, . . • 9 (^+t) /R whose purpose will be to indicate how many streams have 

undecoded digits at position i „ Thus at the start of the process, 

k. = m for all i . The function (Jl ) ( see (3)_) will be .used 
1 i 

in computing the likelihood of a branch of depth -i belonging to the 

stream. 

(1) Using the likelihood X, (1) - A (1) the receiver continues 

k. m 


r 

to decode the first stream until either a U-drop takes place or the 
decoding of the block is completed. If the latter event .happens, the 

received first stream is replaced by the- estimated transmitted one, the 

channel state stream is recomputed, and k^ is decremented by 1 for 
all i „ 

(2) If a U-drop takes place at a node of .depth i^, then all 

branches on the path to that node up to depth i^-J will be considered 

definitely decoded-, where J is a suitably large integer. Accordingly, 

the corresponding received digits will be replaced by the estimated 

transmitted ones, and the corresponding segment of the channel state 

stream will be recomputed. All the parameters necessary for eventual 

resumption of the decoding from the node at which the U-drop took place 

will be saved. Also, the value of a new parameter k*(l) will be set 

equal to the current value of -k. T , where r 'is a convenient 

l.-J+r 

integer. Finally, . the values k.. will be decremented by 1 for 
j = 1,2, . . . , i^-J, and a parameter 1(1) will be -set to i^-J • 

(3) Decoding of the second stream will now begin based on the 

functions ($.) , and continue until either a U-drop or stream decoding 

K « 


1 



completion takes place. In the second eventuality, the values will 

be decremented by 1 for all j . In the first eventuality, k*(2) and 1(2) 

are set equal to k. and i 0 -J , and then all k, , j e (1, . . . ,i 9 -J) , 

1 2 1 3 

are decremented by 1, where i ^ is the depth of the node at which the 
U-drop occurred. Decoding continues in the indicated manner until all 
m of the streams have been worked on. 

(4) If there exist integers > 0 such that k. =0 , 

i = 1,..., S 2 » k i = lj 1 then we £ind fche unique stream 

j* whose digits on levels - - - » 9-^ remain undecoded. These digits 

are then decoded from the algebraic constraint, the parameter T(j*) 
is set to ds set t0 d for d ~ » and tke 

parameters necessary to start decoding of the j* stream at the appro- 
priate node of level are stored. 

(5) Undecoded streams are next divided into two categories . Category 
includes streams j^jg*****^ (5? < m-1) such that k*(j fc ) > kj q ) +r > 

t = (note that I(j ) is the depth of the furthest-’ node of 

stream that has been definitely decoded). Category S ^ includes all 

the remaining hndecoded streams. Decoding of the j^ 1 stream will now 
start in the forward mode by placing the decoder at the node at which 
the U-drop took place and setting the threshold and cumulative likelihood 
values to 0 . The established pattern repeats until all of the streams 
J 1> J2> » * • s ^ 

cremented only for' values i > I(j) when work on the j~“ stream is 
terminated. If any segment of any stream can be definitely decoded from' 
the algebraic constraint, this is done and new parameters for that stream 
are determined ds described, in the preceding step. The .undecoded streams 

Note that the 


of have been worked on, except yhat k^ will be de- 

th 


are again partitioned into the categories S and 


■new ^ may now include some streams that belonged to the old . 



If is not empty and more than one undecoded stream remains, decoding 
of the streams of ^ continues as before. If only one undecoded stream 
rena-ins, its identity is determined from the parity information and the 
task: is compile ted. 

(6) If ^ is found empty while $ ^ contains more than one stream, 
only one of two actions is possible. .Either the decoding effort is 
abandoned or the size of U is increased and all of the undecoded 
streams are put into . After all the latter have been worked on, 

a new S\ is again formed in the regular -manner. If the new is 

empty, U must be increased further; if not, then work on streams of 
resumes with U equal to its original value. 

As pointed out earlier, analysis of a slight modification of this 

pull-up algorithm reveals £ee the Appendix of Jelinek and Cocke } that 

•y 

upper and lower bounds on ^[N’J can be obtained that are essentially 
independent of the block length ^ 


FIGURE CAPTIONS 


Figure 1: The structure of the encoding block. 
Figure 2: The structure of the decoding block. 
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APPENDIX % 


NEW UPPER BOUNDS ON CERTAIN COMPUTATIONAL PARAMETERS 
OF BOOTSTRAP HYBRID SEQUENTIAL DECODING 


I . Introduction 

We will be considering binary input discrete memoryless channels 
that are symmetrical from the input . However, the results are completely 
generalizeable to all channels symmetrical from the input. We impose the 
restriction to simplify our proofs. 

A binary input channel of that class' can be described as follows: 

Let any input x e £o,l"^ produce at the output a pair of digits (y,u) 
y e ^0,l| , u e (o,l, ...,b-l] and let the underlying channel transmission 
probability distribution have the following characteristic? 

w(0,u/0) = w(l/u/l) 

w(l j u/0) = w(0,u/l) (1) 

for all u e £o,...,b-ll . Except for (1)., the transmission function 
w(y,u/x) will be considered arbitrary. Noteithat for the BSC, b = 1 , 
so the u-portion of the paxr'.may be omitted. In the sequel we will be 
considering the bootstrapping hybrid coding scheme that transmits M 
streams, M-l of which are convolutionally encoded binary information 
digits, and the stream is a. modulo 2, position-by-position sum of 

the first M-l streams. The convolutional code used is the same for 
each stream and as a consequence the, M^ stream is also a codeword 
and can thus be decoded. The code will generate a tree with 2 k branches 
leaving each node, m digits to a branch (thus the rate R = ~k/m) , and 
it will simplify our reasoning if the code constraint length will be 


infinite. 



The decoding at the receiver will be done in the way d&cribed in 


Section VI of reference 2. Suppose K streams are left undecoded 
(K < M) , and let 


[ 2 ^] = [(y.(l),u.(l)),(y.(2) ,u.(2)) ; 


’ (y i (K) ’ u i (K))] 


be the vector pair of received digit pairs of the -K undecoded streams 
in the i th position. Then the decoding of the stream will be 

based on likelihoods 



( 2 ) 


where the subscript K indicates the number of undecoded streams, and 

til 

t. is the parity of the i position digits that the decoder .determined 
to have been transmitted in the M-K decoded streams. Section IV of 
reference 2 shows how the righthand side of (2) can be simplified and 
easily computed. The probability ^ ] is, of course, given by 


P Kf Z i ,S i^*i^ ,t il + w(y i (J),u i (J)/x i (J)) 


X 




w(y i (j),u i (j)/x 1 (j)) 


(3) 


and 




We conclude this section by proving 
- Lemma 1 

Let a channel satisfying (1) and a convolutional code be given. The 
distribution of the number of decoding steps for any stream as well as 
the probability of error are invariant with respect to the actual informa- 
tion sequences encoded. 



Proof 


Let s, , s n ,...,s w be the information sequences of the first 
fol a)2 VoM-1 

th 

M-l streams. Then by linearity of the convolutional code, the M 

t i ♦ 

stream corresponds to the sequence + ^s ^ + ... where 

mod 2, position-by-position sum is understood. Let the corresponding 

t 

codewords be denoted by x(s_ ),..., x{6_) , where, of course, 

/v jl ^ i <% 'n 

x(s ) = x(s ) 4- ... + x(S ) (5) 

Suppose the received sequence pairs are (y_ , u. (y M ,u ) = (y,u) . 

1 /c»JL 

Consider the J th stream (J e fl,...,M?) and let x(s,J) be its 

^ J ^ ✓s-* 

codeword corresponding to some arbitrary information sequence js^ . 

Then the likelihood associated with this codeword depends on the 


probabilities 




'1} 


where 0 denotes an all zero sequence. 

rs. 


Now because of (1), the probability of receiving (y,u) when 
x(s. ) , . . . ,x(s ) was transmitted is the same as the probability of 
receiving (y % + MsJ .u^ , . . . , (y M + x(s M > , when 
x(0) , . . . ,x(0) = 0,...,0 was transmitted. Furthermore, it follows 

r- r V* r. r*r- ''V 

from (3) that for any s and J , 

p Mi<h* ) °1 ’ 

P M fe + <y M + = 

= P M { Jl + 5&1> >il>> •••■<% + ifeM ) -Al )/ S ( t + ^J’ J) -2.'i <8) 



where the last equality is a consequence of the linear character of 
convolutional codes. It follows directly from (8) and (4) that also 



M 



+ x(s ),u,) 



+ x ( s ™ 
/v 



(9) 


Since both "whether or no t an error was committed and the number of 

decoding operations depend on the likelihoods associated with the 

various paths in the tree and on their -relation to each other, we see 

from (8) and (9) that these parameters will have the same value when 

s s are transmitted and (y , u ),..., (y , iO are received 
r\ 1 i /v-*l /vM. 

0 

(event A) as when 0,.„.,0 are transmitted and (y ,+x(s 1 ) , u ) , . , . , 
(y„+x(sj,a,) are received (event B) . The conclusion of the -Lemma 
then follows from the observation that both events A and B have equal 
probabilities for any s ,...,s , and any (y, ,u (y ,u ) . 

ys^JL 'M. 1 /\J/1 

QED 

Corollary 

When evaluating the probabality of error or the distribution of 
the number of decoding steps in the bootstrapping hybrid decoding 
scheme used with a binary input symmetrical channel, it may always be 
assumed that all-zero information sequences have been sent. 


2 . Some Preliminary -Results 

Let M stteams be received and let N^(i e ^l,...,M|)be the 

J.L 

number of decoding steps in the first incorrect subset of the i 
stream when the stack sequential decoding algorithm is used. In 
this section we -will derive an upper bound on 

E[ min N^] ' (10) 

l<igM 

3 

We will follow a modification of an approach developed by Zigangirov. 
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Consider the operation of the stack algorithm in the incorrect 
subset that starts with a particular branch emanating from some node 
whose path likelihood value is 2 . Let the stack algorithm continue 
its operation until for the first time the likelihood value on the top 
of the stack falls below 6(6 < z). Let n^z) denote the number of 
operations until the stopping rule is invoked, and let (the expectation 
is over the ensemble of convolutional codes and over the transmission 
process) 

N g (z) = E[n 6 (z)j (11) 


Let 2^ be the number of branches leaving each node, and let ^, 11 , lx 
be the sequences of length m of y,u,x corresponding to a branch. 
Define the branch likelihood function 




= log 


Py(X>E/^ 0) 


mR 


( 12 ) 


Then, since in the code ensemble the branches in the incorrect subset 

are selected independently from the all-zero transmitted branches (see 

Lemma 1), N c (z) satisfies the difference equation 

o 


M 


V z> 


= 2 


k 


N g (z+ 




~mj 


w(y(i),t(i)/0) 4- 




i=l 


z > 6 (13) 


N g (z) = 0 


z < 6 


Lemma 2 

For 6 < 0 , 

h,< 2) < nr- t^' 8 '^ - 

6 2-1 


1] 


(14) 
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where 


lin X(Z» 


Z* x 


and p e(0, 1) satisfies 


w(y(i) s u(i)/0) 


P M (Z»Ii/ x 9 0) 

p M "(ivu/o") 


“ [1-(1t P )R] 

< 2 P (16) 


Proof 

4 

By the well-known maximum ptinciple , N*(z) will be an upper bound 

6 

on N.(z), provided that 

0 ' 

N*(z) , > 0 for z e (6+a?, 6) (17) 

and that the lefthand side of (13) is not smaller than the righthand 
side for z > § when N‘£z) is substituted for N, (z). Substituting 


N* (z) = - [2 p(z " 6_cy) - 1] 


into the righthand> -side of (13), we get 


2 P( z ~ 6 " a ) 2 k ~ m 2 P ^^’ 1 Pw(y(i) ,u(i)/0 ) 


1 ** 

2 k -l 2 k -l 


1_ 1 2 P \ 2 k-m-mpR 

2 k -l 1 


-** -~=> — > 
Z, x 


1 ^ -v ^ M 

V" vi>t : / x ,o)-A-^ 

/ • 1 v A w(y(i),u(i)/0) 

rr„ V u 

u, x L J 


(19) 
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Since N*(z) does satisfy (17), then the bound (14) will be valid 

5 

provided the braced expression in (19) is smaller than or equal to 1. 
Using Holder's inequality and the relation (16) (we are making use of 
the independence of digits along branches), 



_ gin-kimp R 


Therefore, the righthand side of (19) is less than or equal to N*(z) 

6 

and the Theorem is proven. QED 

Next, let z- ,z 0> . . .,z„ be the likelihood values on the i^ 

(i > 0) nodes of the true paths of the M received streams (by Lemma 1 

these are the all-zero paths). Let v < 0 be arbitrary and define the 

indicator function (j)^(z^, z^, . . . , z^) to be equal to 1 if the likelihood 

fch 

on all of M of the true paths leaving the i node falls below the 

value v. Otherwise let (p^( z^) be equal to 0 » Furthermore, 


define 


V z 1’ z 2’"*’ Z m) = E[< M Z r Z 2’*"’ Z M )] 


Since the all-zero information path corresponds to the all-zero transmitted 
sequence, satisfies the following recurrence: 


V a i’--*’ Z M ) = 


i’ v ( 2 l + "X 1 
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wCytl) ,"u(l)/0) . , . w (y*(M) j'u’CM) fS) 


( 22 ) 


if > V for some i , 


® (z 1 * * • • * z ) 1 

V 1 m 

Lemma 3 


if z. < v for all i 
l — 


* v (z r ...,Z M ) if max (.. ^ > 


(23) 


where v < 0 , ^ > 0 satisfies 


M 

r | _ |w(y(i),u(i)/0) 

i=l 


P M (z.n/0) 


l-c 


< 2 


1-p 


MR 


(24) 


and p e (0,1) is the parameter defined in (16). 


Proof 

.By the .maximum principle^, §* (z^, . . . , z^) will b<e an upper bound 
on ^ (z 1 ,- ”> Z M ) 5 provided that 

$ (z,, ...,z._) >1 if z. < v for all r i (25) 

V -1 M — x 

and that the lefthand side of (22) is not smaller than the righthand’ 

* 


side when is substituted for The function 


(z^, > . •, z^) 2 


-|i,(z.j+. . ,+z^-My) 


(26) 


surely satisfies (25) . Substituting it for into the righthand 

side of (22) we -get 
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-^(z 1 +...+z m -Mv) pr- 



M 




w(y^ (i) j ~u(i)/~0,0) 


->-> i=l 

Z>iL 


= 2 




M 


i 




v K (t,t/ °) 




w(y(i),if(i)/<^ 0) 


(27) 


Thus 5" will be an upper bound on provided the value of the braced 

expression in (27) does not exceed 1 . However for p e (0,1) that 
value is by Holder’s inequality dominated by 


-jPbMmR 



^feE/o, 0 ) 


P M (y.»E/o) 


1 ' , P \ l-r, 

—> -j* -£■ | P 

w(y(i),u(i)/0,0) ) < 


Z.H. 


2'p.WtoR 2 -p,MmR 


where the inequality is due to (24) and the fact that digits along 
branches are independent, QED 

Finally, let us define the function 



1’ 



E[$ »(z 1 ,...,z M ) n (z)] 


(28) 


where it is understood that 


a) n (z) refers to the incorrect subset of some arbitrary but 

5 


fixed stream J e -^1, 2, . . , ,mJ- 


.th 


b) the likelihoods z^,...,z^,'z occur on the same i “ depth 
level in all streams. 



,Zm> z ) may thus be interpreted as the expected number (over 
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the set of events for. which . . . , z^-) = 1 ) of decoding steps in 

til ' 

the J incorrect subset stemming from some branch that leaves -a node 

on depth i whose likelihood value is z , if decoding terminates when 

the likelihood value of the top of the stack falls below 8 . . then 

satisfies the recurrence 





it: 




• ,z M + , z+ 


M 


,-m 


r=l 


w(y"(i) s u’(i)/0,'0) 


■b § (z-, j • • • j Z ) 

V 1 M 


(29) 


if z .> 8 and max (z^,...,z^) > v where 

i 


^ 6 (z lf ...,z M ;z) = N g (z) if V 

^7 g (z 1 ,...,z M ,z) = 0 if z < 8 (30) 


Lemma 4 


^t,8 ( z 1»“*» z m jZ ) - 2 k_ x 


[2 


*p,(z 1 +. , .+z m -Mv)+ p (z-s-cy) 


- v ^ z l> * ■ 


(31) 


where p satisfies (16), p, satisfies • (24) , and a is defined in (15), 


• Proof 

Let g (z^, . . -jZ^., z) be the righthand side of (31) „ .Then by 
(23) and (14), 
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T 

z M . 2 ) > n (z) if max ( z ls *-- 5 z M ) < v 

’ ' X 


Furthermore,, if z > 6+ o' then since p e (0,1), we get 
* 1 ~h(z 1 +...+k m -Mv) 

tfv, 8^ Z 1 , '*‘ ,Z M ,Z ' ) - 2 k^ 1 C 2 " V*1 


where the last inequality follows from -Lemma 3 . Thus by the maximum 

principle^ will be an upper bound on 'tEr- provided the 

■* v, o J- v* 3 8 

righthand side of (29) exceeds the lefthand side when 1 ^ 7 -“ ^ is sub- 
stituted for TL7‘ into it. The righthand side of (29) is then equal to 
V, o 







V z i V + V z i*”*»V 


Thus all we need to show is that the expression in braces does not exceed 1 „ In 
fact, it is equal to 


gk-m-pmR+jiMmR 










M 

- 


i=: 

- 





w 


(y(i) > u(i)/o,‘o) 



M 


w(y(i),'u(i)/6,0)| 


i=l 


\ ^ 







M 




J i=l 


^(y,?/o,Q) 


V K (t,t/ 0 ) 


JL_ 

■1“P 


w 


(^(i)^(i)/o7£) 


2 k^m-pmRi^faiR 2 m (l" (l“p )R) g-(j,MmR 


1 


( 32 ) 



The inequality in (32) follows from Holder's inequality, the 
from the fact that .k = mR, and the next-to-last equality from the fact 
that p, and p satisfy relations (16) and (24) (by definition of the 
probability measure the first braced expression on the lefthand 

side of (32) is independent of J) „ QED 


3 . An Upper Bound on the Expected Minimum Number of Decoding Steps 
in the First Incorrect Subset 

We will now use the conclusion of Lemma 4 to obtain an upper bound 

on the quantity E[; min N^] described at the beginning of Section 2. 

l<i<M 1 

Note first that the upper bound (31) is independent of the index, J 

of the stream whose incorrect subset is being decoded (see (28) and 

following) „ Let 5 be the maximum of the likelihood minima pertaining 

to the correct paths of the M different streams. If this maximum 

is attained on the J tk stream, then the number of steps in the first 

th 

incorrect subset of the J stream will be exactly n (0). Since 

6 

the first node of each stream has likelihood 0 , it follows that 


E [min »“] < 2 k -l / (0,.., ,0,0)1 

l<i<M 1 / Lbv-tv,6 J 


do (33) 


V-6 


where the coefficient 2 -1 is necessary because there are that many 
incorrect branches leaving the first node. Now using (31), 


fc!^0’—0-0) * i, 

M “ J- 


— Mp, 2 (iMv " p( §+G ') 


since ? is an increasing function of v. Hence 


E C,”i" <] * 


-l<i<)i 


2~P q> 

l-(p/%) 


if Mp > p 


(34) 



Now in (24) we have a relationship of the form 



2 


MR 

i-p 


Since the lefthand side is a monotone increasing function of p, , the 
inequality is easier to 'satisfy if p, is as small as possible. But 
(34) says that p,> p/M . So -in order to find the rate R below which 
the-lefthand side -of (34) is finite, we will set p, = p/M . Inequality ■ 
(24). then becomes equivalent to 




and it is .understood that p e (0,1). It can be shown that the lefthand 
side of (35), F M (p), is monotonically increasing with p e (0,1) , and 
the lefthand side of (36), G^(p) is monotonically decreasing. There- 

fore, if ? M (0) £ g m (°) and F tf(l) > G M (1), then there is a unique 
P M e (0,1) such that F ^(p M ) = G M <P M ) and for a11 


R < -log F M (p M ) = - log G M (p M ) (37) 

the expected minimal amount of computation in the first incorrect subset 

is bounded by a constant. Since it can be -shown that F M ( G ) < G M (0) 

and F, (1) > G,,(l) is true always, then we have 
M — M 



Theorem 1 


Let p M e (0,1) 
Then E F" min «?] 




be the unique value for which F_,(p 1 = G..(p,.) . 

M H M M >K M 

is upper bounded by a constant for all rates 


R < -log F M (p M ) . 


-M 

N is the number of decoding steps in the first incorrect subset of the 

£h 

i stream when M streams have been transmitted. 


Let us next define N^(K) to be the number of decoding steps in the 
first incorrect subset of the i^ among the K streams that have been 
left undecoded (i.e, M > K streams were received, M-K were decoded 
by the hybrid method, and K streams --probably the most difficult ones — 
are still to be decoded). We suggest that a very good measure of com- 
putational complexity is the parameter 


[ 


max 

2<K<M 


mm 

1<L<K’ 



mm 
w l<i<K 


N t (K) 


G 


k=2 


(38) 


which may be interpreted as the expected maximum number of decoding steps 
that - need be done in the course of .decoding of the entire hybrid block in 
any first incorrect subset. 

Let i^-ig, . . •,ij c (i.. e (1,2, ,,,,M)) be the indexes of those K 
streams that remain undecoded. Now by definition, 


CO 



that there is a subset of K streams from among the M which when 
considered together are such that che first incorrect subset of each 



stream requires more than Si steps for its decoding. Hence by the union 


bound, 



Therefore by (38) , 



K=2 


From (39) and Theorem 1. we can then come to the following conclusion. 


Theorem 2 


Let p K e (0,1), K = 2, 3 , . . . ,M be the unique values for which 

F (n ) = G (p ) . Then E max min N.(K) I is upper bounded by 
K K K K ^ 2<H<M l<i<K 1 - 1 

a constant for all rates 


R < min '[-log F (p )] 
2<K<M K K 


( 41 ) 



APPENDIX 5 


ESTIMATION OF EN AND 
a 


en l 


Lemma 1 : Equation 


2 


1-R 


e s(D 2 -l) 


+ 


* 

sD 

e 


( 1 ) 


has a (possibly- complex) solution s for all D* e (0,<») 

Proof 

Suppose first D* = P/q , p,q .integers. Then (1) becomes 


1-R 

2 1 R = e q + e S 


p/q 


Making the variable change ( e S )^ q = z 


becomes 


.q „1-R P , P+q 

z 2 = z + z 


( 2 ) 

and multiplying by z q , (2) 

( 3 ) 


Now (3), is a polynomial in z , and as such, has at least one root by the 
theorems of algebra. If z^ is one of these roots, then clearly 
a *= ( Z D )^ is a ^oot of (1), Observing that rational numbers are 
everywhere dense on the real line, the Lemma follows. QED 


Theorem 2 


Under the hypotheses -of Theorem 1 M < -rr/cu implies 



cos q)b 
sin tub 


r) 


cos mb 
sin tub 


-) 


(5a) 


(5b) 


where r and (u are solutions of 
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1 -ft D*-l D* 

2 = r cos u)(D*-l) + r cos ojD- 


( 4 ) 


D*-l D* 

0 = r sin ,tu(B*-l) + r sin ujD* 

which exist for all D* e ( (R) ,1/2) . 

Proof : 

fch. 

For pathg ;of length ^ > 1 , let the j path cumulative metric 

fch 

be denoted " p . . Denote the metric of the j single branch p . . 

J 

Observe the p^ : 'all tbee branches are I.l.D . 

For some complex s , define 

d 


■L 


X (s) - ) e 

o 


( 6 ) 


and define 


V s) 


* +1 

Z sp. KT " 1 s M-i 

• ■ L* 


(7) 


where the ^ * .is meant to run over all paths frozen and unfrozen at 
level & . 


f. is defined in -one of two ways: 

J 

1) If node . j at level St is frozen, -we Arbitrarily define there 
to be one extension to level i+1 with zero additional metric. Thus 

s »0 


f.. = 1 • e 

J 


(8) 


2) If node j at level Jt is not frozen, 'we define f to reflect 


d extensions with each branch having an I.l.D. , p. . So 

d 1 






( 9 ) 


i=l 



Suppose s can be. chosen such that ET q (s) = 1 it will be seen that in 


our case s will exist and 'will be complex). By the. 1. 1. D. property on 


the branch incremental metrics jj,^ , 


- 1 


ET (s) = d E e 

o — 1 

It follows immediately that for all nodes frozen or not, 


( 10 ) 


Ef. = 1 

J 


We now show ET^ (s) - ET^ ^(s), thus proving by induction that 


(ID 


lim ETb(s) = 1 


Write ET« (s) — E 

* (over j) 

E 


j fixed 


■ — 

3 

— 


E 


sJ 

e *' J E f ) 


( 12 ) 


= E 


J>- 


E ^ 


by (11) 


“ E Vl 00 


We can now. rewrite . (12) , breaking up the sum into sums of paths 
frozen at a_ , paths frozen at b_ , and paths remaining active over 
an infinite length: 

■ i - e [£• 

frozen frozen ooiy 

at a at b active 


s Jii 


+ E 


_E^] + e[£ « s E 


( 13 ) 



l6li 


Theorem 1 implies that the third term in (13) is zero so long as 
|b-al < rr/to . The first two terms are approximately 



respective ly 


In actuality, frozen paths do not have precisely metrics a. or , 
since paths may "overshoot" the barriers below freezing. The ambiguity 
in (14) may be resolved but only with tedious calculations, -which will 
not appear here. 


Thus (13) may be rewritten 

1 ~ EN e Sa + EN e Sb 
a b 


( 14 ) 


If the value of s which satisfies (10) is expressed as 


e S = r[cosuj + i sin a>] 
we can write -(12) in real and imaginary parts, 


a 

b 

cos cub 

EN r 
a 

cosoja + EN, r 

b 

EN r a 
a 

sin oja + EN, r b 
b 

sin iob 


(15) 


(16) 


(16) are simultaneous equations in two unknowns EN^ and --EN^ . -When 
solved, (16) yields the claimed result.^ 

It remains to show chat s exists satisfying (10 ) y Now, 

n 

e[T o (s)] . 2‘ n d ^ (") e S(nD *- k) - 1 (17: 

k=0 



when the source and distortion measure are used to evaluate the -expec- 
tation.. (17) in turn reduces to 


2 1-R . e s<p*-l) + JB* 


( 1 ) 


whose solution exists by Lemma 1. 


QED 
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APPENDIX 6 

PROOFS AND ALGORITHMS FOR PERMUTATION CODING 
A. Proof of Optimality of Encoding .Procedure 

Theorem ; Let f ) be any nonnegative, monotone nondecreasing, convex 

upward function of | a| . Let the distance between the vector X^ = (X ,X- , . . . ,X. 

X U4 J 

and Y v =(J^, Y 2 , . . . , Y ) be measured by 

N 

d(X (N) ,Y CN) ) = y^f(|x.-Y.j) (A-l) 

i=l 

Let = (V^, Vg, . . . ,V N ) be any vector for which > Vg . , . > 

(N) 

and let B be a block -code whose codewords Y are all distinct . per- 
mutations of V N . Then if X^ denotes the kth largest component of 

X^, the Y^ e B that minimizes d(X^^,Y^) has Y. = V, for 

~ “ X k k 

k = 1,2, ...,N. 

Proof ; From the additive nature of Equation (A-l), it suffices to show 
that if X x > X 2 > ... > , then (V^ V 2> . . . f V ) = is 

the Y (N) e B that minimizes d(X^,Y^) . Furthermore, once this 
has been established for N = 2 , it is easily established for N > 2 
by induction. 

When N = 2 , there are six cases to consider: namely 

Case 1 x^ > v^ ->v 2 > x 2 Case 4 

Case 2 x^ > v^ > > v 2 Case 5 

Case 3 x^ > x^.' v^ > v 2 Case 6 

In each case -we must establish that 


v x > x x > x 2 > v 2 
V 1 * X 1 > v 2 > x 2 
V 1 ^ v 2 > X x > x 2 


f d x l- v ll) + f x 2~ v 2* ^ ^ f( l x l" v 2l ) + £<l x 2 “ v ll> 


(A -2) 



l6j 

Since f(-) is a function of the absolute value of its argument, Cases 
4,5, and 6 will follow immediately from the establishment of Equation 
(A-2) for Case (1), (2), and (3). 

Case 1: We have V^-Xg > Vg-Xg > 0 and . X^-V > X^-V^ > 0 . 

Hence, (A-2) follows from the monotonicity of f( ) . 

Before treating Cases (2), and (3), we note that if we can establish' 
Equation (A-2) for f(|x-v|) = fQx-vj) - f(0) then it will clearly 
hold for f(*) a s well. Hence we lose no generality be assuming 
f(0) =0 . 

Lemma For a > 0, b > 0 , f(a) + f (b) < f (a+b) 

Proof See Figure A-l . A straight line is drawn through the points 
(a, f (a)) and (b,f(b)) . Since f (') is convex upward and f(0) = 0, 
the line intersects the abscissa at a nonnegative value. Triangles 
Tj and are similar. The base of T 2 is larger than the base of 

so the altitude of Tg is larger than f(a), the 'altitude of T^. 

Thus the straight line intersects the point (a+b,h) where 

f(a) + f(b) < h < f(a+b) . QED 

Case 2: We have 

f(l x 1 " v 1 |) + f(|x 2 -v 2 |) < f(|x 1 -x 2 \) + f((x 2 -v 2 |) 

< f(|x r v 2 |) <f(jx 1 -v 2 () + f(|x 2 -vj) 

where the first inequality follows from x Q < v. and monotonicity, the 
second from the lemma and the third from noqnegativity. 

Case 3: We have 

f (|x 2 -v 2 |) < min^f(|x 1 -v 1 j),f(lx 2 -v 2 |)J < max £f (| x^v^ ) , f <|'x 2 -v 2 | )J 


< f(|x 1 “V 2 l) 



Let f Cl C^| ) = f (| o' + x 2 -v |) - f (| x 2 " v ll ) • Appiying the lemma to 

f(') yields 

f (|Xi-ViD + f((x 2 -v 2 |) = fdx^Xgl) +f(|v 1 -v 2 |) + 2f (ixg-v^ ) 

< ‘f(|x 1 ~x 2 + v 1 " v 2^ ) + 2f (l x 2" v i^ 

= f(| Xl -v 2 |) + f(|x 2 ^) QED 


B. Best Choice of ,g^ for Mean-Square Error Criterion 


.(i) 


denote the ith largest of N random variables, each 

2 — • lx 1 


Let X 

with mean zero and variance cr^ • Let lx 1 ^ denote the ith largest 
of the absolute value of these random variables. Then the mean-squared 

error for Variant I and Variant II codes are; 

v 

n ^ “i~ ... “In . 


Variant I 


(X (1) : y 2 

+1 


Variant II D = E 


S' „ n -t. ,.+n. 

V_ j=l i=n^+. ..n^ 

HE -*>>■) 


(B-l) 


(B-2) 


i-1 


Noting that 
N 



(i)A 



i=l 


2 

a 


these equations can be rewritten as 


Variant I 



K 


j=l 


,(B-3) 



(B-4) 


Variant II D = <7 - 2 





Differentiating Equations (A-3) and (A-4) with respect to , setting 
the result equal to zero and solving for (j,^. results in the expressions 
given in Equations (11) and (12). 


G . Monotonicity of n ^ for Minimum Distortion 

Let or be the appropriate ith coder statistic for Variant I or 
Variant II codes. That is, 


ff i 



Variant 'I 


Variant II 


(C-l) 


Then from Equations (11) and (12), the optimum values of the ^ . which 
minimize the mean-square error are 


n +n„+, . ,+n . 


J 



i-.ttj+» • •+ n j -j+1 


(C-2) 


and the resulting mean-square error distortion (from Equation (13)) is 


D 


2 

a 



K 


I-.-5 


(C-3) 


Choose an J? such that > 0 , i = 1, 2, . . . . .4ng , and let 

a = n g-l > n j? = b (C-4) 

It will now be shown that if all the other n^ (i f H -1 or $ ) remain 
fixed, the distortion given by (C-3) can be made smaller by reversing the 
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* 

roles of ti£ ^ and . That is, define a new set of groupings n^ , 

given as 


n. = n. 
x x 


x f J? -1 or $ 


Then 


n S = n 5-l 


n Jl-l = n a 


-K 


K 


i 2 1 V"' ' « . 2 2 1 V" 1 2 

■D - a - - \ n.fe.) < a - - ^ ^ = D 

j=l j=l 


(C-5) 


(C-6) 


Proof 


Let L = n..+n„+, . .+n n , . Then D-D' can be -written as 
i A x~ l 

1 2 1 ^ 

D - D ’ ‘ a te L+l + ' ' • + °'L+a ) + b K-U+a + ' • ' + c, L+a+b ) 


1 2 1 2 
b (o, L+l + - • * + °W + a (o L+b+l + - * + °'L+a+b ) 


(C~7) 


After some manipulation, this can be written as 

D “ D ' = Sb [K + -%a ) ’ ( W“ <+ %+b ) ] X 

jT b - a) (a L+1 +. . .dtv L+a )+(b-a) (tf L+b+1 +. . •+0' L+a+b ) - a («L+a+l + ’ • 'X+Jj 


Now 


(q L + * • * +a W > ^+b+l + * • * + £+a+b ) 


(C-8) 


(C-9) 


so the first bracket is nonnegative. The following inequalities establish 
that the second bracket is nonnegative: 

(b " a)(a, L+l + *-‘ +CV L+a ) ^ (b “ a) a o- L+a 


(C-10) 



(b ' a) ( *W-tH-l + * ’ •• +w L+a+b ) ^ (b_a)a °Wb 


3 ( °%+a+l + ' * , " hy L+b ) - (b-a)a a L+a+1 


(C-ll) 

(C-12) 


The second bracket In Equation (G-8) is then bounded from below by 


[ ] > (b-a)a Q. 


(b-a)a | a L+a + a L+ . +b - C' 1+ . +1 


Thus D-D* > 0 , as was to be proved. 


a* 


(C-13) 


D . Algorithm that Determines Almost Optimal Grouping »fn^, n „ , ... 


for Permutation Codes 


1 . 

2 . 

3. 

4 . 

5. 

6 . 

7. 

8 . 

9. 

10 . 


Choose N and R, 

Initially set K as the smallest even -integer such that log^K > R 
Initially set the groupings to be approximately equal. (If K 


divides N set n^ = N/K for all i.) 


Compute .u^Ug, . . ,,u 


K 

Set -y3 = 1 . Solve Equation (20) for -p_^ . Adjust ,j3 until Equa- 


tion (18)is satisfied for the desired rate, 


K 


Compute n^ as the closest integers to. p_^N such that X} n_^=N 
Test if any n. = 0 . If yes, proceed to step 11. If not, proceed 
to step 8, 

Test if new set of n^ agree with old set of n^ . If yes, proceed 
to step 9. if no, go back to step 4. 

Store n, ,n 0 , ...,n , and the exact values of R and D corresponding 
to this partitioning. 

Let K — K + 2 and start with new grouping closely approximating 
grouping stored in step 9. (For Variant I codes, let n^,=n^=l 
and . ,n^_ 1 be the same as the grouping stored in 9 except 

that the largest n^ has been reduced by 2.) Return to step 4. 
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11. Print n lS n 2> . . . ,n k ; R and D stored in step 9. If K is odd .go to 
step 14. 

12. Set K as smallest odd integer such that loggK 2: ^ 

13. Return to step 3. 

14. Stop. 

E. Binary Coding of Permutations Encoding Algorithms 



3. 


I(i)-e-n i i - 1,2, 

I(0)«“ 0 

A <r-0 

% j?+l 

V 1 

TT 4 — * TT + P y ' I(i) 
i=0 


,K 


4. If $ - N-l, go to (8) 


5. 


P <— P 


Kir) 

N-J? 


Otherwise continue 


6. I(^) Ity) - 1 

7. Go to 2 

8. j) 0 

9 . $ -f-i 

~0 

10. If TT < 2 , Sy<r~0 t otherwise (s^*^— 0 and tt <— it - 2 *) 

11. If < Q go to (8). Otherwise stop. 
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'' Decoding Algorithm 


1. 


2 . 


3. 


4. 


5. 


6 . 


7. 


8 . 


9. 


10 . 


11 . 


12 . 

13. 


14. 


15. 

16. 


P <— N 



I(i) = n. i = 1,2,..., K 

H<—o 
# <—J£+ 1 

R 4— 0 
±< — 0 
i< — i+1 

R°*“ R + 1(1) 

If R <• P , go to (4), otherwise continue.. 



If J < N-l continue, otherwise go to (12). 
-P«— (P-R + -I(j A )) (N-j|)/I(jj) 

Go to (2) 

i 4™ 0 
i 1 + -1 

If I(i) =0, go to (14) , - otherwise continue 


17. 


Stop . 



